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Preface 



This volume contains selected papers presented at the Second Asia-Pacific Con- 
ference on Simulated Evolution and Learning (SEAL’98), from 24 to 27 Novem- 
ber 1998, in Canberra, Australia. SEAL’98 received a total of 92 submissions (67 
papers for the regular sessions and 25 for the applications sessions). All papers 
were reviewed by three independent reviewers. After review, 62 papers were ac- 
cepted for oral presentation and 13 for poster presentation. Some of the accepted 
papers were selected for inclusion in this volume. SEAL’98 also featured a fully 
refereed special session on Evolutionary Computation in Power Engineering or- 
ganised by Professor Kit Po Wong and Dr Loi Lei Lai. Two of the five accepted 
papers are included in this volume. 

The papers included in these proceedings cover a wide range of topics in 
simulated evolution and learning, from self-adaptation to dynamic modelling, 
from reinforcement learning to agent systems, from evolutionary games to evo- 
lutionary economics, and from novel theoretical results to successful applications, 
among others. 

SEAL’98 attracted 94 participants from 14 different countries, namely Aus- 
tralia, Belgium, Brazil, Germany, Iceland, India, Japan, South Korea, New Zea- 
land, Portugal, Sweden, Taiwan, UK and the USA. It had three distinguished 
international scientists as keynote speakers, giving talks on natural computation 
(Hans-Paul Schwefel), reinforcement learning (Richard Sutton), and novel mod- 
els in evolutionary design (John Gero). More information about SEAL’98 is still 
available at http://www.cs.adfa.edu.au/conference/seal98/. 

A number of people have helped to make the conference a great success. 
They include our secretaries: Alison McMaster, Jodi Wood and Kaylene Tulk, 
and students: Ko-Hsin Liang, Jason Bobbin, Thomas Runarsson and Chi-Wu 
Chou. We would like to take this opportunity to express our sincere thanks to 
them. 
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Natural Computation 
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Abstract. The idea of mimicking processes of organic evolution on com- 
puters and using such algorithms for solving adaptation and optimization 
tasks can be traced back to the Sixties. Genetic Algorithms (GA), Evo- 
lutionary Programming (EP), and Evolution Strategies (ES), the still 
vivid different strata of this idea, have not only survived until now, but 
have become an important tool within what has been called Computa- 
tional Intelligence, Soft Computing, as well as Natural Computation. An 
outline of Evolutionary Algorithms (EA — the common denominator 
for GA, EP, and ES) will be sketched, their differences pinpointed, some 
theoretical results summarized, and some applications mentioned. 



Abstract only. 

Yao et al. (Eds.): SEAL’9S, LNCS 1585, pp. 1-^D 1999. 
Springer- Verlag Berlin Heidelberg 1999 
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Abstract. One of the well-known problems in evolutionary search for 
solving optimization problem is the premature convergence. The gen- 
eral constrained optimization techniques such as hybrid evolutionary 
programming, two-phase evolutionary programming, and Evolian algo- 
rithms are not safe from the same problem in the first phase. To overcome 
this problem, we apply the sharing function to the Evolian algorithm and 
propose to use the multiple Lagrange multiplier method for the subse- 
quent phases of Evolian. The method develops Lagrange multipliers in 
each subpopulation region independently and finds multiple global op- 
tima in parallel. The simulation results demonstrates the usefulness of 
the proposed sharing technique and the multiple Lagrange multiplier 
method. 



1 Introduction 

This paper addresses the general constrained optimization problem for continu- 
ous variables defined as: 

Minimize f{x) subject to constraints 

gi(x) < 0, ■ ■ ■ , gr{x) < 0, hi{x) = 0,--- ,hm{x) =0 (1) 

where / and the g^s are functions on i?" and the hj^s are functions on i?" for 
m <n, and x = \x\, - ■ ■ , S i?". 

One of the well-known problems in genetic search for solving general op- 
timization problem is the phenomenon called genetic drift [T]. In multimodal 
functions with equal peaks, simple evolutionary algorithms converge to only one 
of the peaks, and that peak is chosen randomly due to the stochastic varia- 
tions associated with the genetic operators. Evolutionary algorithms have been 
criticized for this premature convergence where substantial fixation occurs at 
genotype before obtaining sufficiently near optimal points [21. In the same con- 
text, the main problem associated with the constrained optimization techniques 
such as hybrid evolutionary programming (EP) |2], two-phase EP (TPEP) [5], 
and Evolian algorithms is the premature convergence in the first phase. 

To overcome the above problem, Goldberg and Richardson proposed a method 
based on sharing in Genetic Algorithms; the method permits a formation of 
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stable subpopulations (species) of different strings - in this way the algorithm 
investigates different peaks in parallel [^. 

Recently, Beasley, Bull, and Martin proposed a new technique, called sequen- 
tial niching [^, for multimodal function optimization, which avoids some of the 
disadvantages associated with the sharing method (e.g., time complexity due 
to fitness sharing calculations, population size, which should be proportional to 
the number of optima) . The sequential niching method is based on the following 
idea: once an optimum is found, the evaluation function is modified to elimi- 
nate this (already found) solution, since there is no interest in re-discovering the 
same optimum again. In some sense, the subsequent runs of genetic algorithm 
incorporate the knowledge discovered in the previous runs. 

However, the comparison results of sequential niching methods and parallel 
approaches by Mahfoud [7] indicate that parallel niching methods outperform 
sequential ones with respect to parallelization, convergence speed, and popula- 
tion diversity. Parallel methods, such as sharing function, form and maintain 
niches simultaneously within a population, and seem to have potential to escape 
extraneous attractors and to converge to the desired solutions. 

Consequently, an improvement is expected if the sharing technique is incor- 
porated into the first phase of Evolian. In Evolian, the first phase is equivalent 
to the usual exterior penalty method, since Lagrange multipliers are set to zero. 
When there are constraints, the subsequent phases of Evolian should be applied. 
The existence of multiple peaks implies the need for multiple Lagrange multi- 
pliers since different local optima conveys different Lagrange multipliers. Thus, 
for subsequent phases, the Lagrange multipliers should be initialized in each 
potential local optimum region. 

In this paper, we investigate the usefulness of the sharing function in Evolian 
and propose the multiple Lagrange multiplier method for constrained optimiza- 
tion. 

2 Sharing function 

A sharing function determines the degradation of an individual’s fitness due to 
a neighbor at some distance d. A sharing function sh is defined as a function of 
the distance with the following properties: 

— 0 < sh{d) < 1, for all distances d 

— sh{0) = 1 and lim^ sh{d) = 0. 

There are various forms of sharing functions which satisfy the above conditions. 
In 1^, a sharing function is defined by a sharing parameter cTshare for controlling 
the extent of sharing, and a power law function sh(d) having a distance metric 
d between two individuals as a parameter: 



where a is a constant determining the degree of convexity of the sharing func- 
tion. The sharing takes place by derating an individual’s fitness by its niche 
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count. The new (shared) fitness of an ith individual a;® is given by: eval (a;®) = 
eval{x'^) /m(x^) , where m{x^) returns the niche count for a particular individual 
x^: 

2Nj, 

m{x^) = ^ sh{d{x"^, 33^)): (3) 

i=i 

where Np is the parent population size and the sum is taken over the entire 
population including itself. Consequently, if an individual as* is all by itself in 
its own niche, its fitness value does not decrease (m(a;*) = 1). Otherwise, the 
fitness function is decreased in proportion to the number and closeness of the 
neighboring points. As a result, this technique limits the uncontrolled growth of 
particular species within a population. As a side benefit, sharing helps maintain 
a more diverse population and a better (and less premature) convergence [2j. 

Since the EP procedure deals with the minimization problem, the use of 
fitness sharing in the EP loop is implemented as follows: 

(l> (a;*) = <^(a;*) -|- ■q{t){m{x'') — 1), 

?7(t) = r^(^ - <?(a;i))/2Ap, 

where m(x'‘) returns the niche count for a particular individual a;® calculated by 
The adaptive parameter ry(t), which depends on the population statistics at 
generation t, controls the rate to increase an objective function in proportion to 
the niche count normalized by the total population size 2Np. The scale factor, 
Ts < 1.0 is a positive constant, 3 is the average objective function of the current 
population, and x^ is the best individual in the population. In case where an 
individual a;® is all by itself in its own niche (niche count = 1), the last term 
in equation © disappears and the shared objective function is the same as the 
original one. 

This shared objective function used for the stochastic tournament selection 
(step 5) in the standard EP implementation in I3I4I5I is as follows: 

A selected number of pairwise comparisons over all individuals are con- 
ducted. For each solution, Nc randomly selected opponents are chosen 
from the whole population with equal probability. In each comparison, if 
the conditioned solution offers less shared objective function value than 
the randomly selected opponent, it receives a “win.” 

It should be noted that this shared objective function applies only to the first 
phase because of the computational burden in calculating all the niche counts. 
In the calculation of the niche count of an individual, 2Np number of evaluations 
of the Euclidean distance and sharing function are needed at each generation. In 
addition, the number of competing opponent are set to min(2A^p — 1, 10) to 
fit into the total population size 2Np and to restrict the maximum competition 
size to 10. 

To investigate the usefulness of this sharing technique, let us consider the 
two functions presented in |2]. 
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Problem #1: 

Minimize /i(x) = — sin®(5.l7ra; + 0.5). 

Problem #2: 

. 1 o (a:-0. 0667)2 

Minimize / 2 (x) = /i (a;) • e ® . 

With a population size of Np = 30 and a maximum generation of 30, the plots 
of resulting individuals, where only the first phase of Evolian is used, are shown 
in Figure [TJ The procedure of Evolian is omitted for brevity and the interested 
reader is referred to [5]. 

As can be seen in Figured] without sharing, the first phase of Evolian, which 
is simply an exterior penalty function method, can not locate multiple optima. 
With the help of the sharing function, it can locate individuals at local minima 
in the search space. It is worthy to note that the number of individuals in each 
peak is approximately inversely proportional to the objective value of the peak 
0 . 





(a) fi{x)\ Population without 
sharing. 



(b) fi{x): Population with 
sharing. 





(c) f2{x): Population without (d) f2{x): Population with 

sharing. sharing. 



Fig. 1. First phase run of Evolian with and without a sharing function. 
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3 Multiple Lagrange multipliers 

When there are constraints, the subsequent phases of Evolian should be applied. 
The existence of multiple peaks implies the need for multiple Lagrange multi- 
pliers since different local optima conveys different Lagrange multipliers. Thus, 
for subsequent phases, the Lagrange multipliers should be initialized in each 
potential local optimum region (see Figure El . 

For this purpose, Evolian has a routine to determine the multiple peaks in the 
current population space. Since we are interested only in the global minimum, 
only the peaks having the global optimum are to be calculated by the peak 
determination algorithm. To ease this determination process, the individuals are 
sorted in an ascending order of the objective function value. The high ranked 
individuals are determined to be peaks if they have objective values near the 
best one and also have a distance of at least less than the sharing parameter 
O’ share from the earlier arrived at peak(s). 

The peak determination algorithm correctly determined multiple peaks for 
the function fi . It should be noted that this algorithm draw out multiple peaks 
with almost the same objective values as that of the best peak. For the function 
/i, the algorithm correctly determined multiple peaks, while only one peak was 
determined for the function /2 as it has only one global minimum. After the 
determination of local peaks, Lagrange multipliers are assigned to each local 
peak and are updated at the peak point according to the following update rule: 

Xk[t + 1] = Xk[t] + estg'^{x[i\) and g,j[t -I- 1] = + esthj{x[i\) 

where e is a small positive constant. Each local region undergoes, in parallel, 
subsequent phases of Evolian until it meets the stopping criteria. 




Fig. 2. In Evolian, the Lagrange multipliers are updated in parallel in local optimum 
subpopulation space. 
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Now let us consider the following nonlinear constrained optimization problem 

m-- 

Problem #3: 

Minimize fsix) = xf + {x2 — 1)^ 
subject to h{x) = X 2 — xf = 0. 

This problem has two global optima (xi,X 2 ) = (±l/-\/2, 1/2). With specific 
parameter settings for Evolian as given in Table [1] the results for 100 trials are 
summarized with bar graphs in Figure El 



Table 1. The specific parameter values for Evolian used for the function /a. 



Parameter 


Value 


Meaning 


Np 


30 


Population size 


P 


10”^ 


Error tolerance for EP 


N, 


7 


Generation tolerance for EP 


So 


1.0 


Initial penalty parameter 


Smax 


10'’ 


Maximum penalty parameter 


7 


3.0 


Increasing rate of penalty parameter 


^ share 


0.1 


Sharing parameter 


Ts 


0.1 


Sharing scale factor 


^tol 


0.05 


Peak determination similarity parameter 



The results for 100 trials are summarized with bar graphs in Figure!^ From 
Figure the frequency of forming stable subpopulations is found to be about 
60% in both the cases of with and without using the sharing function. It can 
be seen that the use of a sharing function in the first phase has no significant 
improvement compared with the case where the sharing function is not used. 
The use of subsequent phases in Evolian leads to the formation of multiple 
subpopulation regions, regardless of the use of sharing function. 

By investigating the bar graphs, it can be seen that the more the number 
of peaks determined in the first phase, the more the frequency with which the 
solution converges to the optima. Thus the search for multiple subpopulation 
regions is critical in finding the multiple global optima. 

It can be said that the use of multiple Lagrange multipliers in multiple sub- 
population regions effectively searches for multiple global minima in parallel. 
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(a) Frequency versus number of 
peaks in 100 trials using Evolian 
without sharing. 



30 



25 




Number of peaks 



(b) Frequency versus number of 
peaks in 100 trials using Evolian 
with sharing. 



Fig. 3. Results obtained by Evolian with or without sharing. The white bar indicates 
that the solution converges to one optimum, while grey bar to two optima. 



4 Summary 

After the first phase of Evolian algorithm, the local minimum regions are de- 
termined using the peak determination algorithm. By applying the multiple La- 
grange multipliers to these subpopulation regions, the globalness of a local solu- 
tion can be improved. In addition, this subpopulation scheme is inherently par- 
allel so that the computation time would be greatly reduced if it is implemented 
on a parallel machine. It is investigated that the use of multiple Lagrange mul- 
tipliers in multiple subpopulation regions effectively searches for multiple global 
minima in parallel. 
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Abstract. Evolution Strategies(ES) are an approach to numerical opti- 
mization that shows good optimization performance. However, according 
to our computer simulations, ES shows different optimization perfor- 
mance when a different lower bound of strategy parameters is adopted. 
We analyze that this is caused by the premature convergence of strategy 
parameters, although they are traditionally treated as “self-adaptive” pa- 
rameters. This paper proposes a new extended ES, called RES in order 
to overcome this brittle property. RES has redundant neutral strategy 
parameters and adopts new mutation mechanisms in order to utilize the 
effect of genetic drift to improve the adaptability of strategy parameters. 
Computer simulations of the proposed approach are conducted using 
several test functions. 

Keywords: Evolution Strategies, Numerical Optimization, Strategy 

Parameters, Neutrality, Robustness 



1 Introduction 

Evolutionary computation has been widely recognized as a robust approach 
to various kinds of engineering optimization problems. There are three main 
streams in this field, i.e., Evolution Strategies(ES) [I], Genetic Algorithms (G A) 
13 and Evolutionary Programming(EP) |3. Especially, when we consider nu- 
merical optimization, ES gives us better results than the other two in many 
problems (for instance, El)- Although ES has several formulations, the most re- 
cent form is (/r, A)-ES, where A > /r > 1. (/r. A) means that p, parents generate 
A offspring through recombination and mutation in each generation. The best 
fj, offspring are selected deterministically from the A offspring and replace the 
current parents. Elitism and stochastic selection are not used. This paper uses 
ES without recombination, following Yao and Liu |12| . 

ES considers that strategy parameters, which roughly define the size of mu- 
tation, are controlled by “self-adaptive” property of their own. However, they 
often converge before finding the global optimum so that individuals cannot 
practically move to any other better points. Therefore, to avoid this behavior. 

The authors acknowledge financial support through the “Methodology of Emergent 
Synthesis” project(96P00702) by JSPS (the Japan Society for the Promotion of 
Science). 
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strategy parameters are conditioned to be larger than a certain small positive 
value e, i.e., the lower bound. However, according to our computer simulations, 
which will be described in Section]^ in detail, ES not only shows strongly depen- 
dent performance with respect to e but also has a different optimal e for each 
problem. This suggests that ES should be applied to an optimization problem 
with e that is carefully tuned to it. 

Recently, Liang, et al. observed the same phenomena on EP and pointed 
out the importance of careful setting of the lower bound. 

This paper focuses on how to overcome this brittleness which comes from the 
insufficient “self-adaptive” property of strategy parameters. Thus, we propose 
a new design of individual representation that has redundant neutral strategy 
parameters for each active strategy parameter so that they can accumulate var- 
ious genetic changes through generations using the effect of genetic drift . In 
addition to this, new genetic mechanisms associated with the above individual 
representation are also introduced in order to replace the current active strategy 
parameter with one of those neutral strategy parameters stochastically. We call 
the proposed approach as “Robust-ES(RES)” . This original idea comes from the 
basic concept of operon-GA IBimn]. Operon-GA uses redundant genotype and 
new genetic operations so that each individual can generate adaptive size of ge- 
netic change, which contributes to autonomous diversity control in a population. 

The rest of this paper is organized as follows. Section [^formulates the opti- 
mization problem and ES. Section [^describes the proposed approach in detail. 
Section [^ shows the results of our computer simulations. Finally, Section con- 
cludes this paper. 

2 Function Optimization by ES 

A global minimization problem can be formalized using a pair (5,/), where 
S C i?" is a bounded set on i?" and / : S' i-^- i? is an n-dimensional real- valued 
function. The objective is to find a point Xmin G S such that fmin is a global 
minimum on S. That is to say: 

fmin = min /(x), Xynin = &rg fmin (1) 

X S 

According to the description by Back and Schwefel [^, the computational pro- 
cedure of ES can be described as follows: 

1. Generate the initial population of /r individuals, and set g = 1. Each indi- 
vidual is taken as a pair of real- valued vectors {xi, r]i),yi € {!,..., p,}, where 
Xi and r]i are the i-th coordinate value in i?" and its strategy parameters 
larger than zero, respectively. 

2. Evaluate the objective value for each individual {xi,r]i), \/i G {1, . . . ,/i} of 
the population based on the objective function f{xi). 

3. Each parent {xi,r]i),i = 1, . ■ ■ , fJ,, creates X/g offspring on average, so that a 
total of A offspring are generated. At that time, offspring are calculated as 
follows: for i = 1, . . . , g, j = 1, . . . ,n, and k = 1, . . . , A, 
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VkU) = r]i{j)exp{T N{0, 1) + tNj{0, 1)) (2) 

XkU) = Xi{j) + 1) (3) 



where Xi{j),x^{j),r]i{j) and denote the j-th component values of the 

vectors Xi, x^,rii and 77 ^., respectively. iV(0, 1) denotes a normally distributed 
one-dimensional random number with mean zero and standard deviation 

one. Nj(Q, 1 ) indicates that the random number is generated anew for each 

/ . s -1 



value of j. The factors r and r have commonly set to 



and 



(v^) ^ [2]. Notice that, when calculated by Equation (E]) is smaller 

than a small positive value e, i.e., the lower hound ^ e is assigned to r]f.{j). 

4. Calculate the fitness of each offspring {x^, rj^), Vi G {1, . . . , A}, according to 

5. Sort offspring (x^, rj^), Vi G {1, . . . , A} in a non-descending order according 
to their fitness values, and select the p best offspring out of A to be parents 
of the next generation. 

6. Stop if the halting criterion is satisfied; otherwise, g = g+1 and go to step|3] 



A key to successful optimization in any evolutionary computation (EC) is in the 
diversity control. However, the appropriate diversity is strongly dependent on the 
current state of a population and the landscape of a problem. If its population 
is converged too fast compared with the ruggedness of its landscape, a method 
cannot often find the global optimum: on the contrary, if the converging speed 
is too slow, a large computational cost is required to find a global optimum. The 
diversity control in EC is generally achieved by adjusting the balance between 
reproduction and selection. However, we consider here only the reproduction at 
StepO because ES treats the selection at Step E] as a deterministic process. 

Since ES uses not recombination but mutation as a primary operator, the 
calculation of mutation step size {f]i{j)N{0,l)), which is traditionally consid- 
ered to be “self-adaptive”, can be modified for improving the optimization per- 
formance. Kappler [Z] investigated the replacement of Gaussian mutation with 
Cauchy mutation in (l-l-l)-ES, although no clear conclusions were obtained. Yao 
and Liu mm proposed to replace Gaussian mutation with Cauchy mutation 
for (g, A)-ES, where Cauchy mutation uses the following Cauchy distribution 
function: 

Et(a:) = 1/2 -I- (I/tt) arctan(x/t) (4) 

where < = 1. They conducted empirical experiments using many test functions 
to show the improvement of performance, especially on multimodal function 
optimization problems. They called their approach Fast-ES(FES) in order to 
distinguish from classical ES(CES). The success of FES can be explained such 
that the population does not easily lose the global search ability by the con- 
vergence of strategy parameters into local optima, because Cauchy mutation 
generates longer jumps more frequently than Gaussian mutation. However, the 
brittle property with respect to the change of e still remains as shown in Sec- 
tion [H 
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Fig. 1. The averaged best results of FES and RES for multimodal functions when the 
lower bounds are 10“^, 10“"^, 10“®, 10“® and 10“^°. 



3 Robust ES 



When we apply ES to an optimization problem, it shows a similar evolving 
behavior to the other evolutionary algorithms: a simple search focus shift from a 
global region into a local region, which is derived from the gradual convergence 
of the population. This convergence is the direct effect of natural selection which, 
in a practical sense, makes strategy parameters small monotonically. This change 
has been considered as the process of “self-adaptation” . However, we assume here 
that this behavior is not adaptive enough to perform a robust search and thus 
ES can be extended from the viewpoint of giving more adaptability to strategy 
parameters. 

Based on this idea, we propose a following individual representation Xi~. 



Xi = 



( 5 ) 
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where, j = Notice that each Xi{j) has m strategy pa- 

rameters, where the traditional ES has only one strategy parameter. Then, its 
offspring = {xj.{j),r]f^{j,p)) is calculated in the following way. The component 
values x^.{j) are calculated in the same manner as FES, as follows: 

Xk{j)=x^{j) + V,,{j,l)5J (6) 

where 8j is a random number calculated anew for each j based on Cauchy mu- 
tation. An individual Xi has n x m strategy parameters, although only r]i(j, 1) 
is used when its Xf.{j) is calculated. Thus, we call as active strategy 

parameters and r]i{j,p),p = 2, . . . ,m as inactive strategy parameters. They are 
replaced each other and are modified by the following three operations which 
are applied stochastically: 

Odup ■ 1) = Vi{ji 1) C^) 

■mU,p) =Vi{j,P-^),yp& { 2 ,..., to } 

'n^i.hp) = D{fii{j,p)),\/p e { 1 ,...,™} 

Odei ■■ Vi{j,p) =Vi{j,P+l),ypG {!,..., m-1} (8) 

771—1 

= ■min{prnax, X! 

p^l 

VtU,P) = D{i]i{j,p)),yp G {!,..., to } 

Oinv ■ rj^UA) =ViU^P),^py /{2,---,d| (9) 

m{j,P) = ViUA)) 

%{j,P) = D{i]i{j,p)),yp G {!,..., to } 

where, D is the same mutation as Equation 0with the lower bound e, and r]max 
is a constant. That is to say, Odup shifts all of r]i(j,p) into the adjacent position 
of (p-k 1) then modifies with D. Odd discards r]i{j, 1), shifts all the other rji{j,p) 
into the adjacent position of (p— 1) and inserts the smaller value either rjmax or 
ViU^P) then modifies with D. Oinv swaps rii{j, 1) with one of Pi(j,p) and 
then modifies with D. 

The proposed RES has the same computational steps as those of CES or FES 
concerning the other parts. The difference is only that offspring are generated by 
Equation^ after applying Odup, Odei and Oinv stochastically to each individual. 

4 Computer Simulations 

4.1 Test functions and Conditions 

Six test functions are listed on Table Cl They are hypersphere function, Schwe- 
fel’s problem 2.22, step function, Rastrigin’s function, Ackley’s function and 
Griewank’s function, respectively. Functions fi to /a are unimodal functions 
and the other three, to /e, are multimodal functions. All the functions are de- 
fined in a 30 dimensional search space and have the global minimum fi^min = 0 
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Table 1. Six test functions 



Expression(n = 30) 


Range 


n 

/i(») 


-100 <Xi< 100 


n n 

f 2 {x) 

i=l i=l 

n 


-10 <Xi<lQ 


h{x) = + 0-5j)^ 

i-l 


-100 <Xi< 100 


n 

f^ix) = — 10cos(27TXi) + 10} 

i=l 


-5.12 <Xi< 5.12 


fs(x) = -20 exp f -0.2Y^iX(r=i®i j 


-32 < < 32 


— exp ( i cos 27ra:i) + 20 + e 




n n 

h{x) = 4(550 + ^ 

i=l 4=1 


-600 <Xi< 600 



at (0, . . . , 0). The main purpose of our computer simulations is to show the effect 
of lower bound of strategy parameters e to the optimization performance of FES 
and RES. The both ESs use (/i, A) = (30,200), no correlated mutations and no 
recombinations. The upper bound of strategy parameters r]max is set at 1.0 for 
/4 and 3.0 for the other functions. In the case of RES, Odup, Odei and Oinv are 
applied to an individual with the probabilities of 0.6, 0.3 and 0.1, respectively. 
The number of strategy parameters m for each variable is set at 6. The six 
function were solved 50 times under the same initial conditions. 



4.2 Results 

Figure 13 compares the results of FES and RES for the unimodal functions /i, /2 
and /a. Figure 13(a), (c) and (e) show the results of FES. The effect of the lower 
bound is observed for all the functions. For /i, the best results were obtained 
when e was 10“^ at generation 200, 10“^ at 500 and 10“® at 1000. The better 
results were not obtained for the cases of smaller lower bounds, i.e., 10“® and 
10“^'^. In case of RES shown in Fig. [3b), better performance was obtained when 
the smaller lower bound was adopted. A clearer difference between FES and RES 
was observed for /2 and /a as shown in Figures 13c) and (d) or (e) and (f). FES 
for /2 showed the stagnation of performance for every case. As a result, the 
best performance was 2.5 x 10“^ when e = 10“®. In case of RES, better results 
were obtained as the smaller e was adopted. Especially, RES reached 3.9 x 10“® 
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(a) FES for /i 




(c) FES for /2 



(b) RES for /i 




(d) RES for /2 





(e) FES for /s (f) RES for /s 



Fig. 2. The averaged best results of FES and RES for unimodal functions when the 
lower bounds are 10“^, 10“®, 10“® and 10“^°. 



when e = FES for /a found the global minimum only when e = 10“^ and 

10“^ as shown in Figure [21^e), while RES with any e successfully optimized the 
function without a large difference in computational cost. 

Figures era) to (e) show the results for multimodal functions, / 4 , /s and 
fe- Similar results to those of unimodal functions were obtained. As shown in 
Figure eja), FES showed stagnation for all the cases, although the best result 
of 1.9 at generation 1500 was obtained when e = 10“"'^. In the case of RES, 
the better results were obtained according to the use of smaller e as shown in 
Figure [U(b). The results for in Figures HTc) and (d) show the same tendency 
as that for / 4 . Figures [IJe) and (f) show the results for /g. RES obtained the 
better and robust results than those of FES, although the both ESs showed the 
stagnation after about generation 300 and 500, respectively. However, by looking 
at 50 trials when e = 10“^°, no trials of FES found the global optimum, while 
RES found it successfully in 24 trials. 
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Therefore, what these results are suggesting to us is that FES should adopt a 
carefully selected lower bound for each problem to obtain the best performance, 
but RES can use a smaller lower bound without worrying about the decrease of 
the performance. 

5 Conclusions 

This paper proposed an extended ES, called RES, that shows robust performance 
against the lower bound of strategy parameters. Computer simulations were 
conducted using several test functions in order to investigate the performance 
of RES. The robust performance was confirmed in all six functions. The future 
work will be directed to the detailed analysis of the evolving behavior in RES 
and the application of the proposed approach to evolutionary programming. 
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Abstract. The p-median problem is an NP-complete combinatorial op- 
timisation problem well investigated in the fields of facility location and 
more recently, clustering and knowledge discovery. We show that hy- 
brid optimisation algorithms provide reasonable speed and high quality 
of solutions, allowing effective trade-off of quality of the solution with 
computational effort. Our approach to hybridisation is a tightly cou- 
pled approach rather than a serialisation of hill-climbers with genetic 
algorithms. Our hybrid algorithms use genetic operators that have some 
memory about how they operated in their last invocation. 



1 Introduction 

The p-median problem is a central facilities location problem that seeks the lo- 
cation of p facilities on a network of n points minimising a weighted distance 
objective function |10| . The problem is NP-complete and has a zero-one in- 
teger programming formulation |23| with variables and n'^ + 1 constraints 
and many techniques have been developed to heuristically solve instances of the 
problem I4I25I26I27I281301 . For finding high quality approximate solutions, hill- 
climbing variations of an interchange heuristic I4I9I16I30I are considered very ef- 
fective, but they risk being trapped in local optima. Other alternatives have been 
also explored; amongst them, Lagrangian relaxation msm, Tabu search m 
and Simulated Annealing m . Recently the p-median problem has been identified 
as a robust method for spatial classification, clustering and knowledge discov- 
ery [I5|17ll9j . While facility location problems may involve perhaps hundreds of 
points, knowledge discovery applications will face thousands of points. 

Genetic Algorithms (GAs) have been suggested as a robust technique for 
solving optimisation problems. However, progress towards solving the p-median 
problem displays a chronology analogous to the efforts to use GAs for solving 
other combinatorial optimisation problems. One side, we have the recent records 
on optimally solving instances of the Travelling Salesman Problem (TSP) with 
linear programming and local cuts |2iTT2| that dim the efforts to solving TSP 
with GAs H3EI]. On the other side, we see GAs providing very good solutions 
for the bin packing problem urn- 
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For the p-median problem, early attempts with GAs used direct binary encod- 
ing and the results were discouraging m- It was then accepted that GAs could 
not compete with the efficient and well designed hill-climbing approaches used 
for heuristically solving the p-median problem. More recently, the use of integer 
encoding 1 1121 and some theory of set recombination has shown that genetic 
algorithms could potentially become competitive. Although it is now clear that 
integer encoding is better than binary encoding for the p-median problem, none 
of the at least 5 crossover operators has been identified as the most appropri- 
ate. Also, the adequate balance between quality of the approximate solution 
(proximity to optimality) and computational effort has not been established. 

This paper argues that although GAs for the p-median problem have im- 
proved, they hardly outperform hill-climbers. The trade-off of effort vs quality 
slightly favours hill-climbers. However, occasionally, GAs happen to identify so- 
lutions which are closer to optimality. In order to incorporate the efficiency of 
hill-climbers and the potential higher quality solutions of GA an approach to 
hybridisation is proposed. We will discuss why this hybridisation is challenging 
and present result that illustrate the benefits of this hybrid approach. 

2 The p-Median Problem 

In real D-dimensional space, the p-median problem is concerned with selecting 
p stations out of S' = {si, S 2 , ■ • ■ , Sn} points so that the sum of the distances of 
all Si in 3?^ to their nearest station is minimum. The problem is motivated by 
the 2D scenarios where the points are sites that must be serviced by p stations 
selected from the sites. Naturally, the distance from every site to its nearest 
station should be as small as possible. In fact, the p-median problem has a 
formulation where the assignment of stations minimises the expected distance 
for servicing a site from its station. As an example, say that S are positions in 
the plane of potential fires and we are to place p fire-fighting stations. Let Wi be 
the probability that site Si has a fire. Then, we seek to minimise 

n 

E’[d(sj, station for s^)] = has a station for Si) 

i^l 

n 

= station for Si). 

i=l 

The distance d could be the Euclidean metric or any other metric. 

Note that, for any C C S' of size p, the station for Si is the site in C nearest to 
Si and we will denote it as rep[si, C]. That is, rep[si, C] is the representative for 
Si in C and it satisfies that min^j- cd{si,Sj) = d{si,rep[si,C]). In this context, 
the p-median problem is a problem of finding the best set C of representatives, so 
every site is as similar as possible to its representative. This is how the p-median 
translates into a clustering formulation where the sites are partitioned into p 
groups. Each group is a set of sites with a common representative. The expected 
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dissimilarity between a site and its representative is minimum. We write the 
p-median problem in this context as 

n 

Minimise c C =pM{C) =^Wid{si,rep[si,C\). (1) 

2=1 



3 The Solution Methods 



There have been several formulations of the p- median problem as a 0/ 1-integer 
programming problem. A trade-off between the type of constraints in the for- 
mulation and the frequency by which integer solutions occur when solving the 
relaxed linear programming problem allows problem to be solved to optimality 
when n is small (a few hundred sites) |28| . However, when n is moderately large, 
the NP-completeness of the problem requires the use of heuristics to obtain 
approximate solutions. 

In what follows we discuss the most effective approaches from the litera- 
ture emphasising similarities, rather than the differences, since we want to com- 
bine their features to obtain methods that integrate their advantages. The most 
popular type of heuristics are hill-climbers rediscovered in many contexts and 
best known as interchange heuristics [1fi|Snj . The hill-climbing nature of these 



problem as a graph with 



nodes. The nodes of this graph are all C C 5” 



heuristics is clearly revealed if we structure the search space of the p-median 

n 

with \C\ = p. The edges of the graph are defined as follows, two nodes C and 
C are adjacent if and only if |C n C | = p — 1 ; that is, if they differ in ex- 
actly one representative. So, every node in the graph is a feasible solution, we 
seek to find the node that minimises M{C) in Equation ([T|). The hill-climber 
interchange heuristics start on a random solution Cq (a random node in the 
graph). Iteratively, the heuristic explores a set N{Ct) of adjacent nodes and 
moves to the best alternative in this neighbourhood if the alternative is an 
improvement (i.e. M{Ct+i) < M{Ct)). Thus, the new node Ct+i is such that 
M{Ct+i) — min <7 N(Ct) M{C). The search halts when no better solution is found 
in the neighbourhood N(Ct). The interchange hill-climbers offer different vari- 
ants in how they define the set N{Ct) of adjacent nodes to explore. Complete 
hill-climbers and other hill-climbers [I4l2HIHflT?] have been shown not to be as 
efficient |25) in finding a local optimum of high quality as an original heuristic 
proposed in 1968 by Teitz and Bart m- We will refer to this heuristic as TAB. 

In an amortised sense, in the TAB search only a constant number of neigh- 
bours of Ct are examined for the next interchange [Sf- When searching for a 
profitable interchange, it considers the points in turn, according to a fixed circu- 
lar ordering (si, S 2 , • ■ ■ , Sn) of the points. Whenever the turn belonging to a point 
Si comes up, if st is currently a representative, it is ignored, and the turn passes 
to the next point in the circular list, (or si if z = n). If Si is not a represen- 
tative point, then it is considered for inclusion in the set of representatives. The 
most advantageous interchange of non-representative Si and representative Sj 
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is determined, over all possible choices of sj G Cf. If = {sj}UCt\{si} is better 
than Ct, then becomes the new current solution Ct+i\ otherwise, Ct+i = Ct- 
In either case, the turn then passes to the next point in the circular list, 

(or Si if i = n). If a full cycle through the set of points yields no improvement, a 
local optimum has been reached, and the search halts. The TAB heuristic forbids 
the reconsideration of Si for inclusion until all other non-representatives points 
have been considered as well. The heuristic can be therefore be regarded as a 
local variant of Tahu search [^. TAB’S careful design balances the need to ex- 
plore a variety of possible interchanges against the ‘greedy’ desire to improve 
the solution as quickly as possible. 

The time required to compute M{C ) on an adjacent node C of C is 0{n) 
time — 0{n) steps to find rep[s, C ] for all s G 5 (rep[s, C ] is either unchanged 
or the new representative Si), and 0{n) to compute M{C ) as defined in Equa- 
tion (PO). Therefore, the time required to test points of Ct for replacement by Si 
is 0{pn) time. In most situations, p can be viewed as a small constant, and thus 
the test can be considered to take linear time. 

Simulated Annealing can be considered a hill-climber that may accept solu- 
tions Ct with M{Ct+i) > M{Ct). It also starts with a random Co and iteratively 
redefines a current solution Ct ■ All p(n — p) neighbours of Ct are not explored, 
but they are sampled. A temperature T works as a tolerance parameter for ac- 
cepting & At = M{Ct+i) — M{Ct). When T > At & worse solution may be 
probabilistically accepted as the current solution. The value of T decreases as 
more solutions are explored {t oo). Although Simulated Annealing opens the 
possibility to better approximation because it escapes local optima, its compu- 
tation time is much larger than hill-climbers PI and it demands tuning of more 
parameters. 

Genetic algorithms maintain a population of chromosomes (encodings) of 
feasible solutions. New populations are built from previous ones by genetic oper- 
ators. Simulated Annealing is very similar to a genetic algorithm with population 
size 1 and a specific mutation operator. However, populations provide “implicit 
parallelism” . This means that the solutions in the current population are simul- 
taneous samples of subspaces of the search space. Thus, the GA is exploring 
combinations of subspaces simultaneously and balances allocating chromosomes 
in subspaces of observed good performance with exploring other regions of the 
search space. We can see a progression of robustness in the methods. 

4 The Structure of the Hybrid GA 

The GA proposed here seeks to find a set C of representatives that optimises 
Equation (P. Thus, Equation m defines the objective function. Genetic op- 
erators and the encoding of feasible solutions are tightly related. The litera- 
ture has reached consensus that because feasible solutions are subsets 

C C {si, . . . , s„} with p elements and p << n, the chromosomes are strings of p 
different positive integers less than n. This encoding has some redundancy since 
the same integer values in different order represent the same feasible solution. 
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However, empirical evidence mm shows that no significant improvement in 
performance is obtained by choosing some canonical form (for example, keeping 
the integer values sorted by ascending order within the chromosome), but the 
extra bookkeeping slows down the search. Moreover, some genetic operators de- 
pend on this redundancy for preserving diversity in the population. For example, 
the Template Operator would produce offspring identical to their parents if a 
canonical form is enforced; however, when representing subsets C as strings of p 
integers, the operator can produce offspring that encode a different phenotype 
of its parents despite the parents have equal phenotype. Simple crossover on in- 
teger strings [I] does not guarantee different integer values in the offspring (thus 
a penalty is applied when evaluating unfeasible solutions). B. Bozkaya et al [ 2 ] 
define 4 crossover operators, none of which is a definite winner. They are progres- 
sively more complex versions of simple crossover on integer strings that ensure 
that values are not replicated within a chromosome. As the operator’s complexity 
increases, there is a small improvement in the search but at more computational 
effort per crossover. Since crossover occurs in the inner-most loops of the GA’s 
program, slightly more complex crossover operators rapidly raise the demand 
on computational effort. Also, all these crossover operators mentioned earlier 
have a strong physical bias. That is, the possible offspring of two parents do not 
have uniform probability and the distribution is closely related to the encoding, 
rather than to the semantics of the chromosome in the problem. Operators that 
use the theory of Random Assorting Recombination |20l21j have also been pro- 
posed [6] . These operators balance two desirable properties of crossover in GAs, 
assortment and respect. 

The goal here is not to argue in favour of one or another crossover operator, 
because besides using chromosomes that are integer strings it is not clear that the 
added complexity of more sophisticated crossing is worth it. Moreover, typically 
more complex operators imply more parameters that the end user must properly 
set at the start of the optimisation (a challenging task in itself). We argue that 
a more effective search is obtained by a hybrid optimisation algorithm with a 
rather simple and fast crossover that a GA whose genetic operators are complex, 
heavily parameterised and difficult to use. The operator provided by the TAB 
hill-climber takes a feasible solution Ct and improves it to a new solution Ct+i- 
We consider this a mutation operator and incorporate it as such in the GA. This 
mutation operator may appear computationally costly at first sight, but as we 
discussed in the analysis of TAB, it typically requires 0(n) time (or O(n^) time 
if a local optima is reached) . However, evaluating the objective function requires 
0{n) time, thus 0(n) time is required every time an unseen individual comes 
into play. Since the rate of application of this TAB mutation assures it occurs 
sparingly per generation, its cost is well amortised in the genetic program. 

The second aspect is that the hybridisation must be a tight integration that 
must preserve those aspects of TAB that make it the most effective hill-climber. 
Thus, a non-representative Si that fails to become a representative must be 
banned until all other non-representatives have also been attempted. Our muta- 
tion operator remembers the index i where the last promotion of a Si to a repre- 
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Table 1. TAB and GA optimisation with respect to objective function evaluations. 



Method 


Values of solution found 


Average 


Evaluations 


TAB 


11.75 (3 times) 11.77 11.78 (4 times) 11.86 12.08 


11.89 ± 0.23 


804 ± 14 


Hybrid GA 


11.67 (6 times) 11.68 11.71 (3 times) 


11.68 ±0.01 


5,897 ±35 



sentative took place. Moreover, it can detect local optima, and in this case, they 
are typically of high quality. By adjusting the mutation rate (and the population 
size) the hybrid can be configured to resemble TAB or to be more independent 
(TAB is the case population size 1 and mutation rate 1). 

Table [H shows the performance of TAB and a GA on a p- median problem with 
100 data points and p = 5. The computational effort is measured as the number 
of evaluations of the objective function, a uniform measure of resources required. 
Since TAB and the GA are randomised, we show results over 10 executions. TAB 
is typically much faster that the optimisation with GAs, but risks getting stuck 
on local optima. The best solution with TAB is worst than the poorest solution 
with the hybrid GA. The GA has a population of 25 chromosomes. For the test 
problem, smaller population sizes result in poor performance of the GA. As we 
already mentioned, hybridisation is complicated because TAB’s operation as a 
mutation that does a hill-climbing step needs to remember how it operated in 
its previous invocation. However, even if we remember the last point attempted 
to be promoted, this may be for a very different chromosome than the one who 
is now being mutated. Remembering the context of mutation per chromosome is 
not a successful alternative because it makes the GA work as many concurrent 
TAB searches, each disrupted (rather than helped) by crossover. We have found 
that this results in much computational effort and no better search. 

Another problem that complicates hybridisation is that the chromosome mu- 
tated by TAB may exhibit an above average fitness with respect to the rest of the 
population, moving sharply in the direction of a local optima. This has the effect 
that the chromosome dominates the population and the GA converges early to 
local optima. This is a problem of diversity loss. Thus, we found that the pop- 
ulation size of our hybrids can not be very small. In the results of Fig. D] the 
hybrid improves the performance GA, the hybrid’s population size is 15 while 
the GAs is 50. Smaller population sizes in the hybrid result in poor optimisation 
performance of the hybrid and larger population size result in as many or more 
function evaluations than the simple GA. We also found that although the mu- 
tation rate of the TAB mutation must be small, it must be of some impact when 
it occurs, otherwise it just looks like a costly random mutation. For this, when a 
TAB-mutation occurs, two hill-climbing steps on the chromosome are performed 
(and not just one). More hill-climbing steps over-fit one individual with respect 
to the population. 
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5 Final Remarks 

We have presented a hybrid genetic algorithm that incorporates exploitation 
characteristics of a hill-climber into the GA program. This approach applies 
with all cross-over operators but it demands delicate balancing of the impact of 
hill-climbing so the hybrid GA avoids early convergence. As a result, the hybrid 
GA optimisation is robust and effective and balances effort /quality of solution 
better than the plain GA. 




Fig. 1. Hybrid GA and GA optimisation with respect to objective function evaluations. 
Final average for hybrid GA is 3.54 with a standard deviation of 0.03 while the final 
average for the GA is 3.60 with a standard deviation of 0.07. 
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Abstract. The method of smoothing surfaces by correcting reflection 
lines which is commonly used in the car design industry, relies heavily 
on the experience of designers and often involves very tedious work. This 
paper discusses how genetic algorithms can be used to alleviate this 
problem by providing alternative solutions under suitable constraints set 
by designers. Strategies for designing genetic codes, fitness functions, 
crossover and mutation methods, are investigated, with the aim to make 
the surface smoothing process more intuitive and yet leave designers with 
a greater choice. 

Keywords: surface smoothing, reflection line, genetic algorithm, aes- 
thetic constraints 



1 Introduction 

The problem of smoothing free form surfaces is very important in many indus- 
tries where it is essential to produce surfaces which are visually pleasing. In 
particular, in car body design, undesirable bumps, oscillations or wiggles should 
be identified and corrected to obtain smoother lines. One common method de- 
ployed by the automobile industry uses reflection lines which are obtained by 
reflecting a family of light sources along parallel straight lines on a surface and 
viewing from a fixed position (e.g. m)- Irregularities on these reflection lines 
are then examined and corrected, and the surface becomes smoother as a result. 

A common practice for a designer is to identify the part of the surface with 
irregularities and to manually correct reflection lines along some specific direc- 
tions. The resultant changes to this part of surface are then calculated. These 
tasks may be performed iteratively until the designer is satisfied. This process 
is often very tedious and since it is difficult to predict the effects that corrected 
reflection lines have on the surface, the decision on how to perform the correc- 
tions relies heavily on the experience of the designer. Another drawback is that 
this process does not produce a unique solution, nor a number of alternative 
solutions. What would be desirable is an automated scheme that can generate 
different corrections to reflection lines to produce possible solutions that are ca- 
pable of satisfying not only smoothness but also other constraints specified by 
the designer (e.g. orientations and maximum distance for correction vectors). 
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These alternative solutions would allow the designer freedom to choose the one 
with surface characteristics that is perceived as the most optimal in some way, for 
example, in terms of aesthetics. We propose to achieve this by using evolutionary 
programming methods based on genetic algorithms. 

Genetic algorithms (GA) which simulate the evolution of a population of 
living beings through the mutation of their genetic codes of chromosomes, does 
not aim to provide an exact or the most optimal solution, but rather, to produce 
potential solutions that satisfy specified constraints. The simulation is based on 
the assumptions that offsprings inherit some characteristics of their parents and 
that only those who are fit would survive. How to define and evaluate the degree 
of fitness depends on the application domains. Thus, to apply GAs to a prob- 
lem, one firstly needs to decide how best to construct the genetic codes that can 
represent appropriately the essence of the problem. A suitable method for muta- 
tion together with a fitness function and a penalty function must be devised to 
ensure that survival individuals through evolution would depict faithfully pos- 
sible solutions to the problem. GAs have been applied successfully to numerous 
complex problems that cannot be solved easily by analytical or numerical meth- 
ods. In particular, alternative solutions in computer-aided design problems such 
as smoothing curves and bridge design (2, have been generated using this 
approach. 

This paper explores how GAs can be used for correction of reflection lines in 
an intuitive way. The aim is to remove some tedious tasks from designers, yet 
provide them with further flexibility and control over the ways corrections are 
performed. Section |2] discusses how the fitness function is designed and gives a 
brief description of the overall algorithm. Section [3]covers implementation details 
and Section 0] analyses the results. 

2 Design of Genetic Algorithms 

2.1 Reflection Line 

There are a number of alternative methods for generating reflection lines (e.g. 
I2EI). For the first evaluation of our approach, we choose to use a more simple 
definition of reflection lines proposed by Kaufmann and Klass [2] , instead of the 
physical reflection lines defined by Klass |3]. 



V 



V 




Fig. 1. KKRL: Reflection lines defined by Kaufmann and Klass 
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The latter case will be dealt with in a future paper. To distinguish the two 
types of reflection lines, we denote those proposed by Kaufmann and Klass as 
KKRL. We use parameter spline curves that represent a surface and construct 
their corresponding reflection lines as follows. Given a fixed vector V and an 
angle a, we look for a point on each spline where the angle between the tangent 
vector to the curve and V equals a. The line segment formed by connecting these 
points is the reflection line corresponding to angle a. Thus, a family of reflection 
lines can be constructed for a set of equally distributed angles (see Figure 1). 

2.2 Genetic Algorithms 

To design the first set of genetic algorithms to produce possible smootheQur- 
faces, we choose the most simple assumption and constraints, and plan to care- 
fully analyse their performance before proceeding to more complex cases. Al- 
though it is generally possible to correct the points on the reflection lines in any 
direction, it appears that the most simple and natural way is to correct these 
points along the direction perpendicular to the curve tangent vector. In addition, 
we need to incorporate the following constraints which are considered desirable 
by the automobile industry: 

— the amounts of correction should be kept to the minimum; 

— the reflection lines must be as smooth as possible; 

— local convexity is retained in the adjusted splines, obtained after the reflec- 
tion splines are corrected. 

— the angle between the tangent vector of the adjusted splines and V remains 
unchanged; 

The first constraint ensures that the essence of the design has not been altered 
significantly after the smoothing process. It also keeps the cost to the minimum. 
The second and third constraints ensure that surface smoothness be achieved 
and oscillations be avoided. The fourth constraint obeys the definition of the 
reflection lines. 



Representation scheme The population of individuals in our simulation are 
reflection lines, while the genetic codes are formed by sequences of corrected 
distances at points on each reflection line. Thus, the genetic codes can be repre- 
sented as a 2D array of corrected distances. 



Genetic operations and parameters As the current practice is to correct 
reflection lines directly, while the adjustments to the surface is only implied, in 
our first design of GAs, we choose to use a fixed point crossover and random point 
mutation strategy along only each reflection line. This will facilitate comparative 
analysis of our method to existing ones. However, this strategy will be later 
extended to use variable crossover points (e.g. at points where the reflection 
line is least smooth), and to allow crossover along parameter curves as well 
as along reflection lines. Although there is no sound theory of selecting GA 
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parameters such as population size, crossover rate and mutation rate, there are 
some empirical results indicating that the optimal performance can be achieved 
for the cases where population size is between 20 and 30, while crossover rate 
and mutation rate are between 0.75 and 0.95 and 0.005 and 0.01, respectively 

m- 



Selection of fitness function As stated above, a genetype and a chromosome 
are designed respectively as a single corrected distance and a set of corrected 
distances. Fitness function / provides a mechanism to evaluate a set of reflection 
lines with respect to constraints. The selection of the fitness function is even 
more application-oriented. In our case, the better fit to the constraints, the more 
possibility for a chromosome to be selected for use to generate new chromosomes 
from which new reflection lines are obtained. In the context of industrial design, 
these constraints can be interpreted in terms of position and smoothness as 
follows: 



— positional constraints 

• each KKRL must lie with a strip specified by a designer after inspecting 
the original RL; 

• displacements of the points on RL to be minimum. 

— smoothness constraints of KKRL 

• sum of squares of curvatures must be small as possible; 

• sum of second derivatives must be as small as possible. 



Then the fitness function / = /(<S'c^, N) is defined as 



/ = 



Wl 

Sc^ 



W2 



W3 X N 



( 1 ) 



where wi, W 2 , are the coefficient and Sc^, Sh? respectively are summation 
of the squared curvatures, and the squared distances at all points determining 
reflection lines, while N represents the number of points which have violated the 
curvature constraints. 



2.3 Description of the Genetic Algorithms 

(1) Given a vector V and a surface, compute parametric spline curves 

(2) Compute KKRL corresponding to these spline curves 

(3) Ask user to specify area surrounding each KKRL within which 
the corrected one must lie 

(4) Generate an initial population of chromosomes (40 X 2D arrays of 
randomly chosen displacement of KKRL points) 

(5) For each chromosome (ie each set of displacement of KKRL points) 

1. evaluate its fitness and its probability of crossover and mating 

2. only display those KKRL that satisfy positional constraints 

(6) Generates offsprings from each pair of parents 

(7) Perform crossover and/or mutation on the selected pairs of parents 
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(8) Repeat steps 5-7 until one of the following conditions are satisfied: 

1. a number of chromosomes acquire a specified degree of fitness; 

2. a number of chromosomes satisfy visual requirement; 

3. the population is uniform. 

3 Implementation 

On a SGI workstation, a GUI which uses X- Windows and Motif, has been devel- 
oped in G-|— I- for doing experiments, while the display of 3D curves or surfaces 
is based on OpenGL. 



3.1 The Overall System 

To obtain a set of the reflection lines, users have to provide the system with a 
set of 3D control points and a constant vector V. The system is currently able to 
display a set of reflection lines. After the users specify a set of parameters for the 
GA, the system will correct the reflection lines, and then display the corrected 
surface. 



3.2 Data Structure 

Non-Uniform Rational B-Splines (NURBS) have become the de facto standard 
for GAD surface representation in car body design, and a NURBS representation 
of a surface can be determined by a set of control points. Such a representation 
may be found in many textbooks (see, for example El) 

Following the definition of reflection line described in section 12.11 a few reflec- 
tion lines with respect to a vector V = (0.2, 0.2, 0.8) are displayed in Figure UK a). 





Fig. 2. KK reflection lines: (a) original; (b) corrected 
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Whenever the corrected offsets and the directions of change are given, a new 
reflection line can be determined by a series of scalar parameters, and a family 
of new reflection lines can be determined by a 2D array of scalars, hij. Such 
a set of 2D array of floating points is encoded as a chromosome, and used as 
a representation of a family of reflection lines. Figure |3] describes a family of 
reflection lines with their changing direction vectors, displacements, hij 

and encoded representation at the bottom. 
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As a basis of the developing GA, the data has been structured into classes 
and four major classes are B_Spline, Ref_Line, RLine_Surf and GAJlLine (see 
Figure HI). The B .Spline are used to structure B-splines, Ref_Line are for the 
single reflection line and RLines for a set of reflections lines. Finally GAJlLine 
is designed for the genetic algorithms. These classes are designed as a cascade 
relationship, represented in Figured! i.e. the GA_RLine is based on the Rline_Surf 
combined with their operations and fitness function, while Rline_Surf is a derived 
class of Ref .Line, and so on. 

3.3 Curvatures and Fitness 

The calculation of curvatures on the corrected splines is obtained by adding the 
curvature of each original spline and that of its difference spline. 

s(tt, v) = s(u, v) + d(M, v) 

Su(u,v) = s^(u,v) + d^(u,v) 

Su(u, v) = s^(u, v) + d^(u, v) 



( 2 ) 

( 3 ) 

( 4 ) 
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Fig. 4. The implementation of the system 

where s, s and s are the parameteric representation of NURBS and its first and 
second derivative along u direction, while d, d and d are the difference spline 
and first and second derivatives of the difference spline along u direction. 

4 Analysis of the Results 

We ran the genetic algorithm on the reflection lines with a population size of 30 
generated at random, crossover rate 0.8 and mutation rate 0.08. The coefficients 
w\, W 2 and selected for (1) to measure their fitness in experiments are 100.0, 
15.0 and 10, respectively. 

The corrected KKRL can be seen in Figure |5] at right side. Figure 0 shows 
the fitness changes of the best solution at generations. 

The convergence of the genetic algorithm is qualitatively interpreted in terms 
of the speed of the optimal solutions can be generated. From Figure [H it can 
be seen that the convergence of the genetic algorithm used in our experiments 
is slow at the beginning, but gains a dramatic increase at the later stage. One 
reason could be that initial random solutions are too far from potential solutions. 

The amount of computation involved in our approach is proportional to the 
number of points on the reflection lines to be corrected. This compares favourably 
with the traditional approach which has a linear relationship to the squared 
number of control points in terms of computations. 
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5 Conclusion 

A new approach to correcting the reflection lines based on the evolutionary 
technique has been proposed and implemented. Comparing with other methods, 
our approach has the advantages that corrections on all reflection lines can be 
performed simultaneously and users can influence the results by modifying the 
fitness function. 

We plan to explore this approach further in the following aspects: 

— using a variable crossover point at positions where curvature is less than a 
specified value; 

— performing fixed point crossover along parametric lines as well as reflection 
lines; 

— correcting reflection lines along different directions; 

— using physical reflection lines as defined by Klass |3]; 

— including other aesthetic constraints in the fitness function. 
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Abstract. In this paper, we study the relationship between learning and 
evolution in a simple abstract model, where neural networks capable of 
learning are evolved through genetic algorithms (GAs). The connective 
weights of individuals’ neural networks undergo modification, i.e., certain 
characters will be acquired, through their lifetime learning. By setting 
various rates for the heritability of acquired characters, which is a mo- 
tive force of Lamarckian evolution, we observe adaptational processes of 
the populations over successive generations. Paying particular attention 
to behaviours under changing environments, we show the following re- 
sults. The population with the lower rate of heritability not only shows 
more stable behaviour against environmental changes, but also main- 
tains greater adaptability with respect to such changing environments. 
Consequently, the population with zero heritability, i.e., the Darwinian 
population, attains the highest level of adaptation toward dynamic en- 
vironments. 



1 Introduction 

It is obvious that the adaptational processes of natural organisms consist of two 
complementary phases, each taking place at different spatio-temporal levels: 1) 
learning, occurring within each individual’s lifetime, and 2) evolution, occur- 
ring over successive generations of the population |4|flj1 j . Here, a simple question 
arises: “How should these processes of adaptation at the different levels be con- 
nected with each other for a greater advantage?” The main goal of this paper is 
to point to a possible direction for the answers to this question. 

In the history of evolutionary theory, there have been two major ideas that 
give different explanations for the motive force of natural evolution and the 
phenomenon of genetic inheritance: Lamarckism and Darwinism. The former 
regards the effect of “inheritance of acquired characters” as the motive force of 
evolution. Through interactions with the environment or learning, individuals 
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may undergo some adaptive changes that will then somehow be encoded in their 
genes and direct the evolutionary process. On the other hand, the central dogma 
of Darwinism is that the motive force of evolution is “(non-random) natural 
selection following on random mutation”; mutation itself has no direction, but 
some individuals with advantageous mutations will have more chance of survival 
and reproduction through natural selection. It claims that evolution is nothing 
but these cumulative processes of natural selection. As we know, the mainstream 
of today’s evolutionary theory follows Darwinism, and Lamarckism is regarded 
as wrong or as a heresy. 

In spite of the biological background mentioned above, from the viewpoint 
of engineering it is not necessary to consider only Darwinian evolution models. 
Indeed, the possibility of using the heredity of acquired characters would be 
quite attractive I3I5I . For this reason, we compared Darwinian and Lamarckian 
evolution using a simple abstract model in our previous paper [TTT|. We showed 
that, under changing environments, the population with Darwinian evolution 
not only showed more stable behaviour against environmental changes, but sur- 
prisingly, could also maintain greater adaptability with respect to such dynamic 
environments than could the Lamarckian population. While Lamarckian popu- 
lations could adapt themselves quite quickly to a certain single situation, they 
had difficulty in leaving the specific state of adaptation once it had taken place 
owing to their extremely greedy strategy for genetic inheritance. That is why the 
Lamarckian population in m performed poorly under changing environments. 
Therefore, in this paper, we introduce a new parameter, heredity rate, into our 
Lamarckian model to control the amount of inheritance of acquired characters. 
We discuss whether there is any appropriate value range of the heredity rate that 
enables Lamarckian evolution to cope appropriately with changing environments 
while maintaining quick adaptability toward each single condition. 

With other researchers who have recently considered adaptational processes 
under changing environments [2f8l7] . we also believe that any evolutionary com- 
putation for real-world application must be equipped with adaptability toward 
dynamic situations. For this reason we have concentrated especially on changing 
environments. 

2 Experimental Model 

Here we present our experimental framework and settings. A hundred individuals 
come into a virtual “world,” with 500 units of initial “life energy” for each. Each 
individual has a feed- forward neural network that serves as its “brain,” meaning 
that the individual takes action based on the network outputs (Figure [T]). We 
take an array of real numbers as a “chromosome” from which the neural network 
is developed. The chromosome directly encodes all the connective weights of the 
network [^. Values of the chromosomes in the initial generation are set randomly, 
from the range —0.30 ~ 0.30. 

The world contains two groups of materials, “food” and “poison,” both of 
which have distinctive features, i.e., patterns of bits. For example, in Figure |2] 
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Fig. 2. The materials in the 
virtual world. 

Fig. 1. The architecture of an individual. 

materials in “group A” are food, and those in “group B” are poison. The symbol 
means don’t care whether the cell is black or white. Thus, food and poison 
are discriminated by the upper three bits, and the lower three bits are noise. 

On each occasion when given any material, an individual inputs the pattern 
of the material into its neural network and stochastically determines whether 
to “eat” or “discard” it according to the network outputs. These actions are 
not mapped directly from the outputs themselves. The network outputs are fed 
once as signals to an “Action Decision Module” (Figure [T]), which then finally 
determines the action of the individual stochastically according to a Boltzmann 
distribution. This type of stochastic mechanism is necessary to maintain the 
possibility of seeking more advantageous behaviours, even if an individual has 
already acquired a certain adequate behavioural pattern [TO]. 

If what the individual ate was food, it receives 10 units of energy and tries to 
train itself to produce the “eat” action with a higher probability for that pattern. 
Conversely, if the individual ate poison, it loses a comparable amount of energy 
and tries to train itself to produce the “discard” action with a higher probability 
for that pattern. When the individual discards the material, no learning is con- 
ducted. The aim of each individual is to maximize its life energy by learning a 
rule that discriminates food and poison, which in this case corresponds to a par- 
ity problem of three bits. We use the Back Propagation Learning (BP Learning) 
algorithm, in combination with a Reinforcement Learning framework, to train 
each individual. The coefficients of learning and inertia of BP Learning are set 
&t T] = 0.75 and a = 0.8, respectively. 

Each individual is repeatedly offered a certain number of materials, 400 in 
the current experiments, and learning occurs. We regard this number of repeated 
events as the length of an individual’s “lifetime.” At the end of each generation, 
some of the individuals are selected as parents by a stochastic criterion pro- 
portional to the level of their energy, i.e. their fitness. Parents re-encode their 
network connective weights, that suffered modification through their lifetime 
learning, into their chromosomes according to a given heredity rate (Figure 
12 ). In Figure [21 Wq and Wl represent the vectors of connective weights at the 
time of birth and at the time of reproduction, i.e., after the lifetime learning, 
respectively. Chromosomes Cg and Ci represent the one from which an individual 
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Fig. 3. The mechanisms of genetic inheritance 

develops and the one which the individual produces after learning, respectively. 
Z) is a mapping from genotype to phenotype, and is its inverse. Individuals 
with r = 0 do not re-encode any acquired characters into their chromosomes, 
but just hand the chromosomes that they inherited from their parents to the 
process of GAs. On the other hand, with r = 1 all the acquired characters are 
re-encoded into the chromosomes. Therefore, we refer to the populations with 
T = 0 and those with 0 < r < 1 as Darwinian and Lamarckian, respectively, and 
especially refer to those with r = 1 as full- Lamarckian. Chromosomes of the se- 
lected individuals undergo the genetic processes of recombination and mutation. 
Here, the number of crossing-over points is set randomly from the range 0 ~ 4. 
Each mutation occurs at the rate of 5%, with a variation range between ±0.5. 
Thus, the selected parents reproduce new offspring, which then undergo lifetime 
learning in the following generation. Although the parameters in this paper are 
set heuristically according to some preliminary experiments, we have confirmed 
that changing these values within a moderate range results in qualitatively sim- 
ilar outcomes. 

3 Experimental Evaluations 

Now consider a world where food and poison are characterized by arrays of six 
bits, as shown in Figure El At any given time, one group of the two is set up 
as food and the other as poison. We referred to an environment where “group 
A” is food as “Env A,” and the other where “group B” is food as “Env B.” 
To consider changing environments, we make the world switch between “Env 
A” and “Env B,” so that food and poison swap their roles repeatedly at each 
particular interval, which is here set at 20 generations. Although a situation such 
as this may seem to be rather arbitrary or unrealistic, it can actually happen 
that characters advantageous to survival are totally overturned. A well-known 
example is the industrial melanism of certain moths in the Industrial Revolution 
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Fig. 4. The average fitness: Darwinian (r = 0.0) vs. full-Lamarckian (r = 1.0) 

era in England [TT]. Although we obtain qualitatively similar results even if we 
use a more difficult and random situation, we show here the results for this type 
of simple case, where the environment is repeatedly overturned, for clarity. All 
the experimental results, shown hereafter, are the average of 20 runs. 

3.1 Darwinian ( = 0.0) versus full-Lamarckian ( = 1.0) 

Let us first compare the two extreme cases: Darwinian (r = 0.0) and full- 
Lamarckian evolution (r = 1.0). Figures [JK a) and (b) show the changes in the 
average fitness of the populations. As is evident from the figures, the fitness of 
the full-Lamarckian population oscillates violently as the environment is over- 
turned, while that of the Darwinian population hardly oscillates and is more 
stable. The point that we should especially emphasize is that the fitness of the 
Darwinian population rises over successive generations. This suggests that a pop- 
ulation that can cope with both the rules of “Env A” and “Env B” is formed 
through Darwinian evolution. To confirm this practically, we let four groups of 
populations (an initial generation, 500th, 2000th, and 5000th generation) con- 
duct learning under each of the two environments. Figures El(a) - El(d) show 
the learning curves of both populations under each environment. The figures 
show the changes in average output errors for the discrimination ability learned 
during their lifetime. The mean squared error is used to measure the difference 
between the actual outputs and the ideal outputs. We can confirm that the Dar- 
winian mechanism forms a population of individuals that learn both rules more 
appropriately as the generations proceed. In contrast, full-Lamarckian evolution 
produces individuals that cannot appropriately learn either rule. The two learn- 
ing curves for Lamarckian individuals in the later generation differ from each 
other, which means that they cope with one rule better than the other. However, 
even if the preferred rule is given, the Lamarckian population cannot learn it 
better than the Darwinian one. 
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Fig. 5. Learning curves with various heredity rates 
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Fig. 6. Learning curves: Darwinian (r = 0.0) vs. full-Lamarckian (t = 1.0) 



3.2 Controlling Heredity Rate ( ) of Acquired Characters 

We now control the amount of heredity by setting r at various values and ob- 
serve the evolutionary processes of the populations. Figures CJa) -[3)f) show the 
changes in the average fitness of the populations with the heredity rate set at 
0.0005, 0.001, 0.01, 0.02, 0.05, 0.1, respectively. As shown by the figures, pop- 
ulations with a higher heredity rate become more unstable than those with a 
lower rate. The fitness oscillations of the populations with heredity rates smaller 
than 0.02 are within a rather tolerable range, while the oscillations grow intoler- 
able with larger heredity rates. Figures [21(a) - 121(f) show the learning curves for 
each environment of the 5000f/i populations with the heredity rate set at 0.0005, 
0.001, 0.01, 0.02, 0.05, 0.1, respectively. As we can see from the figures, the two 
learning curves become more different from each other as the heredity rate gets 
higher, which indicates that the evolutionary processes with lower heredity rates 
produce individuals that can learn both rules more appropriately. 
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Average Fitness 






Fig. 7. The average fitness with various heredity rates 



4 Discussion 

Under a dynamic environment, the ability to cope with various situations be- 
comes more important than the ability to cope appropriately only with a specific 
one. The direct Lamarckian effect that greedily transmits the “ability to perform 
something” is not only useless but also harmful under a changing environment. 
On the other hand, the indirect Baldwin effect that transmits the “ability to 
learn something” plays a crucial role. Thus, the Darwinian population became 
most adapted toward the changing environments in our experiment. 

In the real biological world, while a phenotype is developed dynamically 
through quite complex chemical processes according to the information of a 
genotype, it is very difficult to do the reverse, that is, to determine and compose 
the corresponding formation of the genotype for a certain phenotype. This is 
why Lamarckian inheritance is generally said to be infeasible. However, from 
our experimental results we may suggest another explanation for the essential 
reason why creatures selected the Darwinian strategy of genetic inheritance in 
the earlier stages of their evolution. The real world is an environment with 
strong dynamic characteristics; therefore Darwinian inheritance itself has been 
an advantageous strategy for adaptation to the real world. 
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5 Conclusions 

We evaluated on a simple abstract model how learning with inheritance of ac- 
quired characters affects the evolution of the population, especially under chang- 
ing environments. By controlling the amount of inheritance, we have shown 
that populations with lower heritability not only showed more stable behaviour 
against environmental changes, but also maintained greater adaptability with 
respect to such changing environments. 

Although it must be considered whether our experimental model and results 
are sufficiently general, we have clarified possible fundamental characteristics 
that are required for adaptation toward changing environments. Therefore, we 
believe that the results obtained here may give helpful suggestions in, for ex- 
ample, designing artificial intelligence systems or software agents that will be 
brought into play under dynamic environments. 
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Abstract. Evolutionary programming (EP) has been widely used in 
numerical optimization in recent years. The adaptive parameters, also 
named step size control, in EP play a significant role which controls the 
step size of the objective variables in the evolutionary process. However, 
the step size control may not work in some cases. They are frequently 
lost and then make the search stagnate early. Applying the lower bound 
can maintain the step size in a work range, but it also constrains the 
objective variables from being further explored. In this paper, an adap- 
tively adjusted lower bound is proposed which supports better fine-tune 
searches and spreads out exploration as well. 



1 Introduction 

Evolutionary programming (EP) [J has been applied to many optimization prob- 
lems successfully in recent years mm- A global optimization problem can be 
formalised as a pair (S', /), where S C i?” is a bounded set in i?” and f : S ^ Ris 
an n-dimensional real-valued function. The problem is to find a vector Xmin S S 
such that /(a^min) is a global minimum on S. More specifically, it is required to 
find an a:„iin G S such that 

Va; G S : f{x^in) < f(x) 

Here / does not need to be continuous, but it has to be bounded. 

According to the description of Fogel |T| and Back and Schwefel [^, EP is 
implemented in this study as follows: 

1. Generate the initial population of fj, individuals, and set the generation k = 
1. Each individual is taken as a pair of real-valued vectors, (xi,r]i), Vi G 
{I,-- - ,/r}, where rji is an adaptive parameter. Each x has n components 

x{j),3 = I,-- - 

2. Evaluate the fitness score for each individual {xi, r]i), Vi G {1, • • • , /r}, of the 
population based on the objective function, f{xi). 

3. For each parent {xi, rji), i = 1, • • • , ^, create a single offspring (x^, rj^) by : 

= ViU) exp(r iV(0, 1) -h riV,(0, 1)), (1) 

Xiij) = x^{j) + r]i{j)Nj{0, 1 ), 
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Xi{j), r]i{j) and r]^(j) denote the j-th component of the vectors Xi, x^, rji 

and rj^, respectively. iV(0, 1) denotes a normally distributed one-dimensional 
random number with mean 0 and standard deviation 1. iVj(0,l) indicates 
that the random number is generated anew for each value of j . The factors r 
and T have commonly been set to and (-\/^)“^, respectively [^. 

4. Calculate the fitness of each offspring (xj, Vi G {1, • • • , /r}. 

5. Conduct pairwise comparisons over the union of parents (a;^, rji) and offspring 
{x^,r]^), Vi G {I,-- - ,Ai}. For each individual, q opponents are chosen ran- 
domly from all the parents and offspring with an equal probability. For each 
comparison, if the individual’s fitness is no greater than the opponent’s, it 
receives a “win.” 

6. Select the /i individuals out of (xi, rji) and (Xj, TyJ, Vi G {1, • • • , /i}, that have 
the most wins to be parents of next generation. 

7. Stop if the halting criterion is satisfied; otherwise, k = k+1 and go to StepE] 

The adaptive parameter, rji, also called the step size control in Step E] is 
expected to adaptively adjust the step size for each objective variable. The ideal 
result is to have a larger step size for the objective variable at the beginning of 
the evolutionary process to speed up the search, and become smaller at the later 
stage for better fine-tuning. Eq.[T]is the update rule for the adaptive parameters 
at the component level [ 7 ]. An adaptive parameter value can survive to the 
next generation when its corresponding variable value leads to higher fitness. 
However, it can also survive to the next generation when the higher fitness is 
caused by other objective variables. 

In evolution, some of the adaptive parameters are reduced too fast which 
cause the affected variables to lose their usefulness in maintaining diversity [S]. 
If the distance from Xj to the minimum Xj for the j-th component has, for 
example, \xj — x^ \ > 1, and the adaptive parameter rjj < 0.001, the probability 
for Xj reaching Xj will be very small. We expect that the selection procedure can 
expunge those individuals with the phenomenon. In fact, some individuals with 
the phenomenon have even better performance than others in the population. 
These individuals survive and generate offspring with similar characteristics. 
When all individuals have the feature after a certain number of generations, 
the stagnation happens. This problem has been studied in detail in an early 
paper g. 

In evolution strategies |6l9j the lower bound rj is used to keep the step 
size controls from being lost. One implementation of evolutionary programming 
applies a small value 0.001 to replace the negative adaptive parameters when 
using the Gaussian update rule m- The empirical study in |S] has shown that 
a proper selected rj~ can improve EP’s performance significantly. 

However, applying a lower bound may constrain the objective variables from 
finer exploitation. Different problems also need different lower bounds which can 
only be determined empirically jS]. In this paper, we propose a scheme to apply 
a dynamic lower bound on the adaptive parameters for each individual. The use 
of such a dynamic scheme has improved the performance of EP significantly. 
The rest of this paper is organised as follows. Section 2 provides a mathematical 
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and empirical analysis of the adaptive parameters in (1+1) EP. The scheme 
of applying a dynamic lower bound to the adaptive parameters is presented in 
Section 3. Section 4 presents the main results of this paper. It compares EP with 
different lower bound schemes on the adaptive parameter of a set of benchmark 
functions. Finally, Section 5 concludes the paper with some remarks. 



2 Analysis 

In this section, we use a (1+1) EP as the optimization algorithm. Given an n- 
dimensional real-valued function f{x), using one parent in each generation, the 
adaptive parameter rjj is created by: 

= rtf exp(rA(,(0, 1)) 

Here j denotes the j-th component and t = i m- This is a modified version of 

EqU]. Given initial 77 ^°^ , we can find after running n generations of successful 
mutations. Note that the actual generation number will be greater or equal to 
K as the success rate of generating the offspring is no more than 1. Therefore, 
through the sequence 

{ r]f\ r]f\ r]f\ ••• , } 

we get 

= ■n'P exp(r ^ A^i(0, 1)) 

(k) 

The probability that the adaptive parameter ry- ^ will be smaller than an 
arbitrary small number e is: 

P, = P{r^f < e) 



K 

= p(ryf^exp(Ty] A(0, 1)) < e) 



i=l 



Since the sum of k independent N (0, 1) random variables has the distribution | 12 | 
pp.267]: 

^iV,(0,l)^iV(0,/c), 



we get 



Pr, = P 



= P 



(^7]^ exp(TN{0, k)) < 
(^N{0,k) < ln(-^)/r 



= /_ 
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where C = ln(-^) /t. For sufficiently large the following approximation [13] 
can be used: 



The derivative ^(P^) can be used to evaluate the impact of k on P^. 



Ok 



11 , C2 , 1 C2 _3, 



For C = •y/nln(e/ryj*^^), it is apparent from the above equation that 






> 0, if e < r? 
< 0, if e > r? 



(0) 

fo) 



From the trend of the adaptive parameters, we intuitively know that e < 
after several generations, then we get ^{P^) > 0. Thus, the larger the number 
of generations k, the larger P^. In other words, the probability that the adaptive 

(k) 

parameter rj^ ^ becomes smaller than an arbitrary small number e will be higher 
if K is larger. 

To make a further evaluation of the impact of k on the adaptive parameter rjj , 
we conducted a preliminary experiment with (1+1) EP. The benchmark function 
tested was: 



fix) 




1 X ^ 

— exp(— 7 cos(27ra;i)) + 20 + e, 
n ^ ' 

i=l 



fix) is a multimodal function with many local minima. The function dimension- 
ality n was set to 3 and all components Xj were initialized uniformly at random 
over the range [—32, 32]. The total trials were 100, the maximum generation was 
2000, the initial adaptive parameter 77^°^ was 3. We randomly selected one vec- 
tor of 77 to observe the variation and only the rj with successful mutations were 
recorded. Figure 1(a) shows the average variations of the adaptive parameter 
rjj. At random, the third component 773 was selected. All 773 on each successful 
generation were averaged over the trial number. Only trial numbers over 40 were 
drawn. For large k, less trials can generate more successful mutations. 

It is clear that the larger k becomes, the probability to get the smaller adap- 
tive parameter becomes larger. The examples of the (1+1) EP have shown that 
the smaller adaptive parameters were preferred after several generations. That 
is, through the mutation and selection, the evolutionary process is working at 
large step sizes in the early stage and smaller ones after certain generations. 

When the adaptive parameters decrease, the best situation is when the ob- 
jective variables are very close to the global optimum. Thus, the smaller step 
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Fig. 1. (a)The average variations of 773 are shown. The start of 77 ]°^ > 77 ^ ^ is found at 
K, = 12. (b)The average relation pairs ( — ) are shown. At k = 25, for example, after 

j 

average of 49 trials, the worst pair has = 588 

j 



sizes are preferred. If any of the step sizes decrease faster than the objective vari- 
ables approach to the optimum, the process may be stagnated by the unbalanced 
component pair. That is, the step size becomes very small while the objective 
variable is still far away from the global optimum. In figure 1(b), the worst rela- 
tion pair (^) in each generation is shown. The average per trial number and the 
experimental data are obtained from the same results. The stagnation largely 
begins when ^ is over 1000. 

To prevent the unbalanced phenomenon, a lower bound 77“ is definitely 
needed [H]. However, the fixed lower bound creates a limitation on the search. 
The choice of the lower bound is problem dependent. In the next section, we 
propose a dynamic lower bound to improve the situation. 

3 Dynamic Lower Bound 

The implementation of the dynamic lower bound includes two steps. First, set up 
an index to measure the adaptation of the lower bound. Then, adjust the lower 
bound accordingly. The method used to evaluate the performance of the lower 
bound is similar to “1/5 success rule” |9] pp.llO]. If the number of successful 
mutations is larger than 1/5 of all mutations, increase the step size. Otherwise, 
decrease the step size. We calculate the number of offspring selected for the next 
generation, and decide the ratio of successes to all offspring. Then, apply the 
following rule to update the lower bound: 

=V~ (^), (2) 

where S^, is the success rate at generation k and A is a reference rate, which has 
been set between 0.25 and 0.45 in our experiments. It is worth pointing out that 
our dynamic lower bound differs significantly from the 1/5 rule. The 1/5 rule 
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is based on the component level adaptation, while our dynamic lower bound is 
based on the whole population, i.e. is calculated across the entire population. 

Applying the dynamic lower bound, we expect to eliminate the early stagna- 
tion, support the finer exploitation, provide the adaptively adjusted lower bound, 
and most essentially, spread out the population for extensive search. This scheme 
uses the lower bound to indirectly control the mutation step size to optimize the 
evolutionary process. When the success rate is over the reference point, the step 
size will increase to encourage the offspring aggressively to extend the search 
range. On the contrary, the step size will be smaller for the closer range search. 

4 Results 

Six benchmark functions were tested as shown in Table E] The functions were 
numbered as in [Ij, /i, /2, /s are unimodal functions and /g, /lo, /n are multi- 
modal functions with many local minima. The EP algorithm used in our study 
was the improved fast evolutionary programming (IFEP) P]. The difference 
between IFEP and classical EP (CEP) is in Step 0 of the algorithm described 
in Section 1. Instead of generating one offspring using Gaussian mutation, IFEP 
creates two offspring, one by Gaussian mutation and the other by Cauchy. The 
better one is then chosen as the offspring. The population size ^ = 50, the 
tournament size q = 10 for selection, the reference rate A — 0.3, and the initial 
standard deviations 3.0 were used. Two different IFEPs were tested, one with the 
dynamic lower bound initialized to 0.1 and the other with the fixed r]~ = 0.0001. 
The dynamic lower bound in IFEP was updated every 5 generations using eq. |2] 

Table 1. The 6 benchmark functions used in our experimental studies, where n is the 
dimension of the function, /min is the minimum value of the function, and S C 





Test function 






n 


S 


/min 


/l(®) = 








30 


[-100,100] 


0 


Mx) = 


E"=ii®«i+n" 


.1 \Xi\ 




30 


[-10,10] 


0 


Mx) = 






-f {Xi - if 


] 30 


[-30,30] 


0 


fdix) = 


Er=ii®i “ 10cos(27ra;i 


) + 10] 


30 [ 


-5.12,5.12] 


0 


flo{x) = 


—20 exp(— 0.2W 


' 1 


1*?) 










-exp(^Er=i cos(27ra; 


i)) + 20 + e 


30 


[-32,32] 


0 


fll{x) = 


1 

4000 ^i=l 


nEicos(^) + i 


30 


[-600,600] 


0 



Table 0 summerises the experimental results of comparing IFEP with and 
without a dynamic lower bound. All results have been averaged over 50 runs. 
IFEP with the dynamic lower bound has much better performance on /i, /g, /lo 
and /ii because it maximised the IFEP’s capability to keep searching for a better 
function value from the beginning. However, a worse result was observed on 
and /g. Both of these cases lead to early stagnation of search. The experimental 
data showed that the lower bound decreased too fast on certain generations 
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and could not be recovered by eq. 0 This happened when all the individuals 
were trapped into a local optimum. The success rate of the generation could not 
provide any useful information about the search in this situation. The IFEP with 
the fixed lower bound stagnates when the function value approaches the lower 
bound value. Our experimental results appear to indicate that the dynamic lower 
bound we proposed is quite efficient in finding a near optimal solution, but has a 
weak ability in escaping from a local optimum once trapped. There is a trade-off 
between efficiency and optimality here. In general, we feel the dynamic lower 
bound provides a good balance between the two. We are currently investigating 
methods for escaping from a local optimum once the algorithm is trapped. 



Table 2. Comparison between IFEP with the hxed and dynamic lower bound on 
functions /i, / 2 , /s, /g, /lo, fii- All results have been averaged over 50 runs. “dLB” and 
“£LB” mean the dynamic and fixed lower bound respectively. “Mean Best” indicates 
the mean best function values found in the last generation. 



Function number of IFEP w/dLB IFEP w/fLB dLB-fLB 



generation Mean Best Std Dev Mean Best Std Dev t-test 



/l 


2000 


2.00e-17 


3.10e-17 


3.33e-7 


4.05e-8 


-58.22 


/2 


2000 


6.56e-10 


6.97e-10 


2.31e-3 


1.61e-4 


-101.56 


/5 


20000 


1.83 


1.43 


3.58e-l 


5.77e-l 


6.76 


/9 


5000 


7.50 


3.67 


2.48 


1.60 


8.88 


/lO 


2000 


2.15e-9 


2.47e-9 


4.16e-4 


1.99e-5 


-147.74 


/ll 


2000 


1.27e-2 


1.76e-2 


1.17e-l 


1.64e-l 


-4.46 



fThe value of t with 49 degrees of freedom is significant at a = 0.05 by a two-tailed 
test. 



5 Conclusions 

Lower bounds for adaptive parameters in EP and evolution strategies are an 
important issue which have been overlooked by most researchers. This paper 
analyses the variation phenomenon of the adaptive parameter using (1-1-1) EP 
through mathematical and empirical approaches. They both demonstrate the 
necessity to add a lower bound to the adaptive parameters. This paper also pro- 
poses a scheme to apply a dynamic lower bound to indirectly control the search 
step size. This combines population-level adaptation of the success rate with 
component-level self-adaptation of the adaptive parameters to optimise evolu- 
tionary performance. The experimental results have shown that this dynamic 
lower bound can provide better performance in IFEP for most numerical func- 
tions we tested. 

The mathematical analysis of (l-bl) EP in this paper did not consider se- 
lection. The selection also has an impact on the step size. The proposed update 
rule of the dynamic lower bound does not work in some cases. However, this 
scheme provides a good direction to promote and maximise the performance of 
the evolutionary algorithm. 
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Abstract. In this paper, we discuss an approach to an operator schedul- 
ing problem in a large organization over time with the aim of maintain- 
ing service quality and reducing total labor costs. We propose a genetic 
algorithm (GA) with a parameterized fitness function inspired by ho- 
motopy methods and with null mutation to handle a variable number 
of operators. The proposed method is applied to the practical problem 
of scheduling operators in a telephone information center. Experimental 
results show that the proposed method performs consistently better than 
a GA method previously developed. 



1 Introduction 

In the operator scheduling problem for customer service operations at a tele- 
phone information center, we are given a set of working shifts with known start 
and end times and number of short breaks to be taken during the work session. 
The primary objective is to minimize staff shortages against number of customer 
calls over time. This objective is so important to maintain service quality that 
it is treated as a constraint such that the shortage must be zero. The secondary 
objective is to minimize labor costs or a surplus of operators for actual needs. 
Other objectives such as overtime and employee satisfaction are not considered. 
This problem reflects the very significant needs of a large organization such as 
an information service center for telephone directory assistance. Constructing a 
good schedule by hand, however, can be very difflcult. Nippon Telegraph and 
Telephone Corporation (NTT), for example, has more than one hundred such 
centers all over Japan and currently suffers huge deficits. There is urgent de- 
mand to automatically supply efficient schedules in a short time corresponding 
to frequently changing work shift patterns and distribution of customer calls. 

Genetic algorithms (GAs) have been successfully applied to a variety of 
scheduling problems including jobshop and flowshop | ^|7| 4 |b|8j . Yoshimura and 
Nakano [2] first applied GAs to the information operator scheduling problem. 
They proposed a GA with mutation especially dedicated to the problem and a 
partial reinitialization method with good success. The more general form of the 
problem is discussed in |3] under the name of the employee scheduling problem. 



X. Yao et al. (Eds.): SEAL’9S, LNCS 1585, pp. 50 4571 . 1999. 
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Fig. 1. Time distribution of required information operators 



where they proposed tabu search approach to solve the problem and compared 
with other methods. 

The organization of this paper is as follows. Section[2]explains the information 
operator scheduling problem and its objective functions. In Section we briefly 
review the GA approach previously proposed by Yoshimura and Nakano [S] 
and then modify their mutation to handle a variable number of operators. In 
Section EZla GA with ranking based selection, duplicate elimination and local 
search is proposed. A new approach using a parameterized fitness function is 
proposed in Section [H Experimental results using real data supplied by NTT 
are reported in Section O 



2 The Information Operator Scheduling Problem 

The number of human operators required to deal with inquiry calls from cus- 
tomers changes over time, and its time distribution is given based on statistical 
data at each center. Figure [U shows such an example sampled at one of NTT’s 
largest centers. The service starts at < = 8:00 and ends at t = 23:00. The time 
interval is measured in units of five minutes, therefore the total service time in- 
terval of 15 hours corresponds to T = 180 time units. The vertical axis represents 
the number of required operators for each time unit and is denoted by ni(t). A 
solid line at the top of the ni{t) histogram represents tolerable surplus n 2 {t): an 
acceptable margin of at most 5 % at each time unit to absorb daily fluctuations. 

A shift type specifies the work starting and ending times and the number of 
breaks to be taken during the work session. The number of breaks depends on 
the length of the shift type, and the length of one break is fixed at 10 minutes (= 
2 time units). A break pattern is a placement of breaks under a given shift type. 
A working pattern of an operator can be represented by specifying its shift type 
and break pattern. An operator can choose any shift type from a list of admissible 
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Table 1. List of admissible working shift types 



No. 


Shift Type 


Breaks 


No. 


Shift Type 


Breaks 


1 


8 


00 - 


- 12 : 00 


4 


9 


14 : 00 - 


- 18 : 00 


4 


2 


8 


30 - 


- 12 : 00 


3 


10 


14 : 00 - 


- 19 : 00 


5 


3 


8 


30 - 


12 : 30 


4 


11 


17 : 00 - 


- 20 : 00 


3 


4 


8 


30 - 


- 13 : 00 


4 


12 


17 : 00 - 


21 : 00 


4 


5 


9 


00 - 


- 13 : 00 


4 


13 


17 : 30 - 


- 22 : 00 


4 


6 


9 


00 - 


14 : 00 


5 


14 


17 : 30 - 


- 23 : 00 


5 


7 


13 


00 - 


- 17 : 00 


4 


15 


19 : 00 - 


- 23 : 00 


4 


8 


13 


00 - 


- 17 : 30 


4 











shift types and any break pattern under the constraint that the length of any 
continuous working period must be between 30 and 60 minutes. Table [T] shows 
an example set of admissible shift types and the number of breaks. For example, 
a shift type with starting time 8:00 and ending time 12 : 00 has four breaks. 

Let us assume there are a total of D operators available per day, and each of 
these operators is assigned a shift type selected from Table[l]and a break pattern. 
A schedule is obtained by finding a combination of D working patterns with 
possibly different shift types and break patterns. Each chosen working pattern 
corresponds to a (partial) schedule of one operator. Please note that even though 
the total number D is fixed, the total labor costs differ depending on the total 
length of the chosen shift types. In a center, operators must work in pairs, thus 
a working pattern is shared by two operators. To avoid confusion, however, we 
simply assume that one working pattern corresponds to one operator. 

Let n{t) be a number of operators working at time t under a schedule S. 
Total shortage of operators f\ and total surplus /2 are defined in Equation (P, 
where [xj = x if x > 0; otherwise [xj =0. 

T T 

h = f 2 = '^[n{t) -n 2 {t)\. (1) 

t=i t=i 



The objective of the information operator scheduling problem is to minimize /2 
under the constraint that fi must be zero. In jO], a single / with a constant 
a G [0,1] in Equation J2I) is used as a fitness measure. Another type of fitness 
function shown in Equation m can also be considered where the constant a 
must be small enough to satisfy /i = 0. 



/ 



a 



1 — a 

/2 + 1 



r = 



1 



(0 < a < 1). 



(2) 



1 + E“’ 



^“ = /i+a/2 (0<a<l). 



(3) 
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3 Genetic Algorithms 

3.1 Solution representations 

A schedule S consists of a set of partial schedules of all the operators and is 
denoted by S' = {si, S 2 , • . • , s_d}. Each partial schedule Si is a working pattern 
of an operator and is represented by a string of 10 integer- valued genes as Si = 
aiG 2 ■ ■ ■ Oio, where Oio represents its shift type number given in Table [Hand og 
the number of breaks, whereas ai 02 . . . Og represent continuous working length 
before and after the breaks, thus a break pattern altogether. Each 0102 . . . og 
must be between 30 (= 6 time units) and 60 minutes (= 12 time units) as 
described in Section |21 and only ui, . . . ,am(m = og -I- 1 < 8) is actually used. 
For example, an operator who starts working at time ts and ends at te as specified 
by a shift type Oig, first works for a duration specified by Oi, then takes the first 
ten-minute (= 2 time units) break, and resumes work for a duration specified by 
02 and so on. The following equality must be satisfied (for more details, please 
refer to jU]): 



tg Oi -t- 2 -t- O 2 T 2 -t- . . . -f Ctjn — (b ^ Ojj ^ 12, 171 — O 9 -t- 1). (4) 



3.2 Mutation 

For each Si probabilistically selected for mutation with probability Pmut, one 
of the following Ml to M4 is applied with the probabilities Pi,P 2 ,P 3 and pi 
{Pi + P 2 + P 3 + Pi = ^), respectively. 

Ml Two genes and aj^ are randomly selected and their values are exchanged. 
M2 A gene with a value greater than 30 minutes is randomly selected and 
decreased by 5 minutes, another gene aj^ with a value smaller than 60 min- 
utes is randomly selected and increased by 5 minutes. 

M3 ai, . . . , am are randomly regenerated under the constraint Equation (|4|) . 
while ttg and oio remain the same. 

M4 ttio is probabilistically changed to the next (aig-l-l) or the previous (aig— 1) 
type in Tabled og to the corresponding number of breaks, and then oi, . . ., 
Qm are randomly generated with the new og, Oig under Equation ©• 

The mutation defined above assumes the number of genes and the total 
number of operators D are fixed. However, it is desirable to extend the mutation 
to allow D to be varied within an upperbound Dq during the search to find a 
solution of higher quality. A special gene null for oio, meaning that the operator 
is ojf duty, is introduced for this purpose. The following mutation M5, called 
null mutation, is applied with probability pg. 

M5 (null mutation) : oio is probabilistically changed to null. 

The mutation M4 is slightly modified to incorporate this change such that if 
aio is null, it is changed to any type in Tabled] at random. 
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1. Initialize population: randomly generate a set of P schedules. 

2. Repeat Step to Stepl^Li times: 

(a) Select two schedules Si, S 2 from the population with probabilities in- 
versely proportional to their fitness ranks. 

(b) Apply crossover with probability Pcross and obtain Ti and T 2 , otherwise 
just copy Si and S 2 to T\ and T 2 . 

(c) For i = 1, 2, repeat as follows L 2 times: 

Apply mutation to Ti and obtain T. If T is at least as good as T, replace 
Ti with Ti. 

(d) For i = 1,2, if Ti is better than the worst in the population, and no 
member of the current population has the same fitness as T, replace the 
worst individual with T. 

3. Output the best member in the population and terminate. 



Fig. 2. Genetic local search for information operator scheduling problem 



3.3 Crossover 

A partial schedule-wise uniform crossover is employed as follows. Let two par- 
ent solutions be Si = {sn, S12, . . . , sim} and S2 = {s2i, S22> • ■ • > S2m}- Before 
applying crossover to and S2, their partial schedules Sij are sorted first by 
aio and then by ai,...,am in the case of ties. Let us denote the results by 
Si = (i = 1 , 2 ). A new schedule Ti is generated by selecting or 

S2j randomly for each j (1 < j < m). Similarly T2 is generated by selecting a 
Sij for each j that is not selected for Ti. 



3.4 Genetic local search 

It is well known that there are some problem classes in which GAs are not well 
suited for fine-tuning structures that are very close to optimal solutions and 
that it is essential to incorporate local search methods into GAs. The result 
of such incorporation is often called Genetic Local Search (GLS) [Sj. In this 
framework, an offspring obtained by a recombination operator is not included 
in the next generation directly but is used as a “seed” for the subsequent local 
search. The local search moves the offspring from its initial point to the nearest 
locally optimal point, which is included in the next generation. 

The mutation discussed in Section E! 2 ]is used for local search here. Instead of 
applying mutation only once to each individual generated from the crossover, it 
is applied repeatedly and the results are accepted only when they are improved 
(or at least the same). Figure 13 shows the outline of our GLS algorithm based 
on the steady state model with ranking selection. The reinitialization method 
introduced in jO] is substituted by the duplicate elimination technique in Step l 2 dl 
to avoid premature convergence even under a small-population condition. 



Information Operator Scheduling by Genetic Algorithms 



55 



Table 2. Performance comparison under various parameter conditions 



No. 


parameters 


results 




fitness 


D 


fi{avg.) 


f 2 {avg. 


) f 2 (best) 


D{avg.) 


I 


/in [9] 


105 


0 


173.5 


135 


- 


II 


/ 


105 


9.7 


118.1 


- 


- 


III 


/ " 


105 


0 


143.1 


123 


- 


IV 


/ “ 


100 


3.0 


102.5 


- 


- 


V 


/ 


< 120 


0 


128.7 


107 


103.2 


VI 


/ “ 


< 105 


0 


119.9 


97 


102.8 



4 Parameterized Fitness Function 

Yoshimura and Nakano use / in Equation @ with a = 0.7 as a fitness func- 
tion [^. They observe that their GA finds a solution with /i = 0 effectively 
as long as a > 0.3, while the quality of /2 is not always excellent. In their ex- 
periments, fi and /2 start from large values, then fi quickly decreases to zero 
and does not increase again, while /2 decreases very slowly. Once a solution S 
with fi{S) = 0 is found, a new solution S with fi{S ) > 0 is difficult to survive 
because f{S ) is inferior to f{S) in many cases. Thus only a limited region where 
fi is always 0 is searched. On the other hand, if /“ with a = 1 in Equation m 
is used, both /I and /2 decrease smoothly, but fi does not reach zero or close 
to zero. One may be able to overcome this dilemma by finding an optimal a in 
Equation m, but this itself is quite difficult. 

As a possible remedy, we treat a as a parameter, which decreases from 1 to 
0 throughout the search. For our purpose, the algorithm in Figure [2 is slightly 
modified to use the parameterized /“ in which a is first initialized as a = 1 in 
StepIU and is changed as a := (1 — e)a in Step |2E1 after the crossover is applied, 
where e > 0 is a small constant. 

The idea of a parameterized fitness function is inspired by a far more so- 
phisticated approach known as the homotopy method, which has been used for 
decades to find solutions of nonlinear equations |I]. By initializing a = 1, we 
start from a relaxed problem in which minimizing /2 is easier at the cost of 
violating the constraint /i = 0. a is then gradually decreased to enforce /i = 0 
and finally a schedule with /i = 0 and reasonably small /2 is obtained. 

5 Experimental Results 

Numerical experiments based on the data given in Figure [Hand in Table [T] are 
carried out under various conditions. Figure El shows the average time evolutions 
of fi and /2 over 40 runs each on a SUN UltraSO workstation. The programs are 
written in C language, and each run takes about 25 minutes of CPU time. All 
experiments were conducted under these conditions: the population size P — 9, 
the crossover and mutation rate Pcross = 0.2 and Pmut = 0.02 respectively. 
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Fig. 3. Time evolutions of /i and /2 under (a) fixed and parameterized fitness func- 
tions, and (b) different D settings 



probabilities for each mutation: pi = 0.2, p 2 = 0.45, pa = 0.1 and p 4 = 0.25, 
Li = 250, L 2 = 2000 and e = 0.99 are used. These values are determined based 
on preliminary experiments. Results are summarized in Table |2] In Figure [H^a), 
the number of operators D is fixed to 105 as in jU]. In the figure, the results of 
three different fitness functions are compared: (I) the same fitness function used 
in m, (II) /“ with a = 1, and (III) the parameterized /“ with a decreasing 
from 1 to 0. It is clear that by using the parameterized fitness function, the 
quality of /2 greatly improves while the constraint /i = 0 is satisfied at the end 
of the computation. 

In Figure[3Kb), the null mutation M5 is applied as well as Ml,. . .,M4 to make 
D changeable. D is initialized as Dq = 120 under (V) and Dq = 105 under (VI) 
respectively, and can be varied during the search with Dq the upper bound, pi is 
modified from 0.25 to 0.25 x (1 — 1/Ng), and ps = 0.25 x 1/Ng with Ng number 
of shift types. The results of fixed D with (III) D = 105 and (IV) D — 100 are 
also shown for comparison. It can be seen that changing D dynamically results 
in better performance, with the optimal D around 102-103. The results under 
(IV) suggest that it is quite difficult to find a good solution with /i = 0 when 
D < 100. The best results are obtained when the perameterized fitness function 
and the modified mutation is used. Figure 0] shows one of the best schedules 
obtained under (VI). The picture on the right in Figure 0 shows the schedule, 
where the x axis represents time and the y axis the chosen working patterns. 
The filled block indicates that an operator is at work, while each small white 
block is a ten-minute break. Among 105 operators initially assigned, a total of 
102 operators were found to be actually necessary, and the total surplus is 97, 
meaning that only 0.54 operators on average are redundant per time interval. The 
picture on the left shows the corresponding time distribution of the operators. 
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Fig. 4. Example of a solution with D = 102, /i = 0 and /2 = 97 



6 Conclusions 

We have developed a genetic algorithm with local search for the information 
operator scheduling problem. The experimental results show that the use of a 
parameterized fitness function and null mutation improves the solution quality 
with a smaller number of total operators, while satisfying the given constraints. 
Future research will be to investigate the better control of a rather than decreas- 
ing it monotonically. 
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Abstract. We report key algorithmic specific features involved in the evolutionary radial 
network problem solution. We focus on the dimensionality problem of large-scale networks 
and on the singularities of the radial topology search space. We (1) report the difficulties of 
the canonical genetic algorithm in handling network topology constraints, and (2) present 
both the genotype information structure and the recombination operator to overcome such 
difficulties. The proposed recombination operator processes genetic information as mean- 
ingful topological structures, and turns radiality and connectivity into genetic transmissible 
properties. Results are presented to illustrate the difference between the canonical approach 
and the approach taken. 

Keywords. Evolutionary computation, network planning, radial topology constraints. 



1 Introduction 

A large class of important optimization problems has yet no reasonably fast and robust 
algorithmic solution: large-scale network planning problems belong to such class 
when the objective function is non-convex. Network planning comprehend a set of 
different problems whose solutions are subsets of arcs from a given graph. Those are 
important problems in fields like electric power and gas distribution, and telecommu- 
nications. The problem differences rely on the solution topology requisites. 

Network planning has been approached in the past by mathematical programming. 
Branch-and-bound applications can be found in [1-3]. Mixed-integer programming 
approaches can be found in [4], together with Bender’s decomposition [5], and with 
branch-exchange [6]. Tabu search [7], simulated annealing [8], and dynamic pro- 
gramming [9] approaches have also been taken. More recently, evolutionary tech- 
niques have also been proposed [10-12]. A motivation to take evolutionary approaches 
comes from the possibility to address complex objectives — ^the objective function is 
sometimes difficult to define as well behaved. However for large networks the combi- 
natorial nature of decision-making precludes the canonical genetic algorithm (cGA) 
approaches. In the following we will spring out some difficulties related with the ca- 
nonical string genotype (in §2). We will notice that the string genotype approach is not 
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adequate to represent network topological relationships. A new genotype information 
structure is then proposed together with a recombination algorithm (in §3). In the 
proposed genotype approach the problem topological requisites are genetically trans- 
mitted by recombination. Illustration is provided in §4. 



2 Problem Formulation and the cGA Difficulties 

Many of the real-world network planning problems are characterized, as an optimiza- 
tion problems, by (1) a non-convex non-separable objective function of the arc deci- 
sions, and (2) topological constraints like radiality and connectivity. Take (q) as a 
possible subproblem formulation. 

(q) minimize /(T) over all T e R(G) 

where 

/: objective function 
G: set of graph arcs 
T: set of tree arcs 

R(G): space of radial trees for graph G 

The (q) problem can be stated as connecting all nodes by selecting a spanning tree T 
(out of graph G) to minimize/. In the following we refer to network planning as a (q)- 
like problem. For several reasons, such as simplicity of analysis and definition of 
genetic operators, the binary or bit string representation of solutions has dominated the 
genetic algorithm research. A conceivable canonical genotype approach defines a 
graph as an array of arcs, and takes binary bits to select the arcs to form a solution 
tree. However, the problem requires satisfaction of non-trivial constraints, such as 
radiality and connectivity, which the canonical crossover cannot transmit to the off- 
spring. We point out two difficulties for the canonical approach. 

Dl. Topological properties such as connectivity and radiality are not genetically 
transmitted to the descendants by one-point crossover — the descendant populations get 
a significant number of non-feasible solutions. 

D2. Important similarities about solutions can hardly propagate by crossover — the 
one-point crossover operator destroys meaningful network building-blocks, as graph 
adjacent arcs are generally impossible to place close to each other in a string. 

Dl have been reported before [12]: corrective procedures are possible if non-feasible 
solutions are rare occurrences, which is not the current case. D2 have not been re- 
ported in such a context, although it represents a crucial obstacle to the success of 
large-scale optimization processes [13,14]. One could think of improving the repre- 
sentation by developing a complex coding function. However, that is difficult to do 
without introducing non-linearities or other kind of bias search into the process 
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[14,15]. We use a more natural problem-related representation, described in the fol- 
lowing. 



3 Evolutionary Approach 

We take a genotype space to be a partially ordered set (A,<) [16], i.e. a set where: (i) 
a<a\ (ii) a<b and b<a implies a=b; {Hi) a<b and b<c implies a<c, for every a,b,ceA. 
The element a is called the direct precedent of the element b in A, iff: (i) a^b\ (ii) 
a<b\ {Hi) there is no element ceA such that a<c and c<b. The relation is denoted by 
hJa. Similarly, an element b is called the direct follower of an element a in A, iff a is 
one of its direct precedents. 

Trees are particular partially ordered sets: each tree element is directly preceded 
by one and just one single element; an exception is made for the first element (tree 
root), which is not preceded. Take g as a tree T genotype coding function. 

g :T ^ /hJa.’Varc(ah)G T with a<b} 

Suppose we want to change tree information by changing T elements direct prece- 
dence. If we want to keep T as a partially ordered set after the change, we must guar- 
antee that the three above mentioned properties (i), (ii) and (iii) hold for elements 
precedence changing. If the properties hold, we say the change is consistent. Let p(b) 
be the b direct precedence element in T, and F(b) be the set of b direct followings 
elements in T. Take Lemma 1 to identify non-consistent changes. 

Lemma 1. A b direct-precedence change bJa taken over a tree ordered set (I) violates 
order {i-ii-iii) iff b<a. 

Proof: Sufficiency — If b<a there exists in T a direct ordered sequence like, 
flJxJyJ...Jh. A change hJa forces a circulation aJxJyJ...JhJa, and thus an order 
violation (property-ii). Necessity — If b<a does not apply, either (1) a<b, or (2) no 
order exists between a and b. In case (I), a change bJa eliminates the order relation- 
ships between every x\a'^<b and y:b<y by eliminating the existent h-precedence. The 
x-elements order is not changed, they remain as followers of a. Similarly for the y- 
elements, they remain as followers of b, and by change bJa, also followers of a. In 
case (2), a change bJa forces b to become a follower of a, and thus every y:b<y be- 
come a follower of a, instead of being a follower of the existing p(b). 

□ 

Lemma 1 permits to classify precedence changes as consistent or non-consistent. 
When consistent, precedence changing is a possible way to change tree information 
guaranteeing network radiality and connectivity. Moreover, information can be easily 
changed in a simple exchanging procedure over the set of precedence elements, i.e. the 
genotype. Precedence change resolves Dl. 

It is known that a recombination procedure must be able to interchange important 
similarities about solutions (meaningful building blocks)[14]. A tree meaningful 
building-block is a set of adjacent arcs: a possible simple one is a path between two 
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network elements (nodes), i.e. a sequence of direct precedence relationships. As a 
sequence of precedence changes, a path can be successfully submitted to a tree. Note 
that a path should not be rejected just because some of the constituent precedence 
relationships fall, as isolated, into the conditions of Lemma 1 : a path is not just a set 
but a partially ordered set, and thus precedence change consistency must be also tested 
orderly. Path interchange resolves D2. 

Path interchange algorithm 
(Submit a path P to a tree T) 

Denote by F(xeP) the set of followers of x in the path P. Name a the path smallest 
element (a^ VxeP). 

Step 1. Change Pby changing every x-precedence to xJa iff (i) xeF(aeP) and (ii) 
xJfl is consistent. 

Step 2. Update T, and repeat Step 1 taking F(aeP) as the followers of x elements, 
i.e. F(aeP) = KjF(xeP). 

□ 

Recombination algorithm 

(Interchange paths between solutions T and P') 

Step 1. Randomly select two nodes, a and b. 

Step 2. Find the paths P and P' between a and b: P in T and P' in P'. 

Step 3. Submit P‘ to P‘, and P" to P. 

□ 

Recombination example 
(Recombination of solutions P and P') 

Solutions P and P‘ are represented in Fig 1 (a) and (b) respectively. The descendants 
are represented in (c) and (d) respectively. P is the path between the two randomly 
selected element b and d. Take a as the solutions smallest element. The procedure is 
summarized in the following: 

Step 1. Represent the solutions as tree ordered sets 

P ={cJa, hJc, c/Jc, eJii,/Je} and P' ={hJa, eJa, cJe, dJe,fJej and the 
paths P in P and P“ in P‘ as P ={bJc, t/Jc} and P" ={bJa, eJa, dJej 

Step 2. Submit P‘ ={bJc, dJc}' to P‘ ={bJa, eJa, cJe, dJe,fJe] 

Element c is the path P‘ smallest element. Fie) = {b, d}. Both hJc and cUc 
are consistent changes in P' as no order relation exists between such pairs. 
Changing precedence results in the tree {bJe, eJa, eJe, dJc,fJe}. 

Submit P" ={bJa, eJa, dJe] to P ={cJa, bJe, dJe, eJd,fJe. Element a is 
the path P" smallest element. P(a) = {b, e}. Both hJa and eJa are consis- 
tent changes as in P: a<b and a<e. Changing precedence results in the tree 
update {eJa, hJa, cUc, eJa,fJe}. Only element e has a follower in P": 
P(e) = d. The change JJe is now consistent (note that it was not before the 
update). The change results in the tree {eJa, hJa, dJe, eJa,/Je}. 

□ 
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be be 





Fig. 1. Solutions T and T" are represented in (a) and (b) respectively. The descendants are 
represented in (c) and (d) respectively. Descendants result from path interchange between ele- 
ments b and d. 



4 Illustration 

We take a complete graph problem as a theoretical example — it maximizes building- 
block diversity over a fixed number of nodes. That makes the correct combination of 
building-blocks unlikely as a random event. We took G as a 10 nodes-45 arcs com- 
plete graph. To better illustrate the rule of recombination we produce a favorable envi- 
ronment for the discrimination of the best building-blocks: we took /to guarantee a 
small building-block cost variance, or so-called small collateral noise [17] if{T)<f{T’) 
for every: T with N correct arcs; and 7” with N-1 correct arcs). Convergence is illus- 
trated in Fig 2-a,b. 

We also took a canonical approach to address the same problem, Fig 3-a,b. Note 
that for the canonical approach feasibility (i.e. radiality and connectivity) can drop 
considerably in the process. The results concern a 60 solutions population, and non- 
elitist binary tournament selection [18]. We did not use mutation. 
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Fig. 2. Evolution of the genotypes along 20 generations: convergence to the optimum (proposed 
approach). Plots of contour lines show the evolution of arcs-frequency along generations. The 
inner lines refer to a higher frequency, (a) Full population — note that in the first generations all 
arcs are present with a low frequency. In the last generations only nine arcs are present, with a 
high frequency, (b) Best solution — note that the optimal solution is found after 12 generations 
(four correct arcs in gen-1, five in gen-2, six in gen-3, seven in gen-4, eight in gen-11, and all 
nine arcs in gen- 12) 
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Fig. 3. Evolution of the genotypes along 21 generations: convergence to a non-optimum (cGA 
approach). Plots of contour lines show the evolution of arcs-frequency along generations, (a) 
Full population — note that (1) the population presents an important number of solutions with 
optimal arcs (36,40) at the latest 10 generations, and (2) those are not present in the last- 
generation solution (they are lost), (b) Best solution — the optimal solution is not found. The 
cGA is not able to find a solution with more optimal-arcs: the hest first-generation solution 
presents four correct arcs (1,7,22,36), as well as the hest last-generation one (1,9,22,45) 
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The results illustrate some differences between the proposed genotype approach and 
the string canonical one. Namely, that the optimum is found by the proposed approach 
after 12 generation, and that the canonical approach was unable to make any solution 
improvement in 21 generations, despite having access to important number of easy-to- 
discriminate building-blocks. These show how GA performance can be enhanced by 
effectively combining important solution building-blocks and ensuring feasibility. 



5 Conclusion 

We reported key specificities of the taken genotype space and recombination algo- 
rithm involved in the development of an evolutionary approach to radial topology 
constrained problems. The major innovations of our proposal are that: (1) the tree 
genotype information be taken as a partially ordered set of nodes instead of taken as 
an array of arcs; and (2) the recombination process be to change network path infor- 
mation between trees instead of swapping string segments between solutions. 
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Abstract. The allocation of office space in any large institution is usually a 
problematical issue, which often demands a substantial amount of time to perform 
manually. The result of this allocation affects the lives of whoever makes use of the 
space. In the higher education sector in the UK, space is becoming an increasingly 
precious commodity. Student numbers have risen significantly over the last few years 
and as a result, university departments have grown in size. In addition, universities 
have come under increasing financial pressure to ensure that space is utilized as 
efficiently and effectively as possible. However, space utilization is only one issue to 
take into account when measuring whether or not a particular allocation is of a 
sufficient high quality. The problem of space allocation is further complicated by the 
fact that no standard procedure is practiced throughout the higher education sector. 
Most institutions have their own standards and requirements, which are often very 
different to other institutions. Different levels of authority control the domains of 
rooms and resources in different institutions. The most common situation is where a 
central university office controls a number of faculties, each managing a number of 
departments. This paper will focus specifically on applying optimization methods to 
departmental room allocation for non-residential space in the higher education sector. 
It will look at the use of three methods (hill-climbing, simulated annealing and genetic 
algorithms) to automatically generate solutions to the problem. The processing power 
of computers and the repetitive search nature of this problem means that there is great 
potential for the automation of this process. The paper will conclude by discussing 
and comparing these methods and showing how they cope with a highly constrained 
problem. 

Keywords. Space Allocation, Hill-Climbing, Simulated Annealing, Genetic Algorithms 



1. Introduction 

The problem of space allocation affects the lives of almost everyone in some way or 
another, whether it is the size or layout of their office or work environment, limited 
parking space or even the organisation of their homes. This paper will deal with the 
problem of efficiently allocating space within academic institutions. 

As student numbers increase and university departments expand, there is significant 
pressure on estate managers and departmental heads to ensure space is utilised as 
efficiently as possible. Due to the varied requirements and constraints, this task is not 
simple. Obtaining just an acceptable solution often takes a large amount of man-hours. 
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There have been few papers which have addressed the problem of space allocation 
within academic institutions. Giannikos et al. * states that space allocation has received 
little attention in their paper on using goal programming for academic space allocation. 
Rizman et al.^ presented a goal programming model to reassign 144 offices for 289 
staff members at the Ohio State University. At a facilities level, Benjamin et al.^ used 
the Analytical Hierarchy Process to determine the layout of a new computer laboratory 
at the University of Missouri-Rolla. 

The size of the space allocation process is related to the number of resources that 
need to be allocated and the number of rooms available. In the education sector, sizes 
vary from 1600 rooms in 30 buildings to 20000 rooms in 600 buildings"'. 

The organisation of most working environments exhibits specific structural 
properties, such as members of staff being grouped into departments. As such the large 
problem of university wide space allocation lends itself well to decomposition. This 
allows us to partition the problem into smaller clusters, which it is possible to deal with 
in a reasonable time-scale. In real life, the domain of rooms and resources within a 
university are managed by differing authorities, with each level managing a subset of 
the overall domain. These authorities may be at different levels of abstraction with a 
central authority administrating the whole domain. All this occurs with 
communication between the different levels, with lower levels regularly requesting 
more space and competing with other groups on the same level. It is a continually 
evolving problem, made more difficult by the addition and/or removal of 
resources/rooms all the time. 

The actual allocation of resources to areas of space happens in two ways. The first 
being the initial allocation of a set of resources to a group of empty rooms. The second 
being the addition or removal of resources from a previous allocation. This paper will 
concentrate on the first problem of allocating resources to empty rooms, as all the 
principles discussed in the first problem hold for the second. 



2. The Space Allocation Process 

There are two different levels to the problem of space allocation, a space utilisation 
level and a constraint satisfaction/optimisation level. 



2.1 Space Utilisation 

The main requirement for the higher education sector is to find working space for all 
staff and students. This involves allocating specific areas/rooms for each individual 
resource. The amount of space required is dependent on the level, numbers and 
functionality of each resource. Space guidelines are published by the various 
education authorities and are used within this decision process. However, it is usual 
for universities to adapt these guidelines to suit their own requirements. The Full Time 
Education (FTE) 1987-space standards are the most widely adapted guidelines, with 
square metre values for each type of resource. 

With a listing of all-available rooms and sizes, a listing of all resources, their type 
(e.g. Professor, Lecture Hall, Secretary, etc.) and these space guidelines it is possible to 
allocate resources to rooms while attempting to maximise the utilisation of the rooms. 
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The problem is complicated by the fact that not all resources are capable of sharing 
rooms with other resources, a majority actually requiring their own rooms. The 
problem then is to maximise the utilisation of the rooms without violating any of the 
sharing limitations. 



2.2 Constraint Satisfaction 

The most efficient utilisation of space is one which allocates all resources while 
minimising the amount of rooms and space wasted by these resources. However, space 
utilisation is simply the first part of an efficient space allocation process. There are 
additional constraints which need to be taken into account: 

• Resource specific requirements 

Unique facilities that are required by certain resources impose a limit on which 
rooms that resource can be allocated to. For example, lecture theatres may require 
disabled access or Audio Visual Aid (AVA) facilities, etc. The option of 
modifying rooms through building work may be available, but it is more practical 
(and less expensive) to use rooms which already have the facilities required. 

• Ensuring grouping and close proximity of resources 

There are two types of spatial grouping requirements: Adjacency and proximity. 
There is always a requirement to place resources belonging to the same group in 
close proximity, (e.g. members of the same department should not be widely 
spread over many buildings). Likewise, resources, which are highly dependent on 
each other, must be adjacent or at least close to each other (e.g. Wash Rooms must 
be adjacent to Operating Theatres etc.). 

• Ensuring distance between conflicting resources 

This is the direct opposite to the previous requirement. It is often desirable that 
resources which conflict in some way should be kept at a distance from each other. 
For example, library space should not be located next to engineering workshops 
due to noise problems. 

Each university will have its own constraints and opinions as to what requirements 
should be satisfied to make a good allocation of space. 

The constraints that cause the most complexity as far as optimisation is concerned 
are the ones involving spatial locations. The ability to cope with these types of 
constraints requires information regarding the location and distance of all the rooms 
within the university. These graphs (fig. 1) of rooms and distances can be decomposed 
into subsets of buildings and floors, but even floors can hold many rooms which all 
require distances from each other. Obtaining this information, which is unlikely to be 
available at all universities, is a major task. It could be obtained through floor plans or 
Computer Aided Design (CAD) drawings, but would require substantial work. 

To reduce the amount of information required in making decisions upon proximity 
information, optimisations could be made to reduce the number of proximity links 
between rooms such as subset grouping^®. By grouping adjacent rooms together tightly 
and then obtaining and storing information regarding the distances of groups of rooms 
from each other, the amount of information can be dramatically reduced. 




Automating Space Allocation in Higher Education 



69 




Fig. 1. Example of Room ID, size and distance information required 



The same subset grouping method can be applied to the resources requiring allocation, 
making the decision process one of overlaying the resource subsets onto the best matching 
room subsets. The minimal method of satisfying proximity and adjacency constraints is to 
only hold information about room adjacencies, not distances. Knowing which rooms are 
adjacent to each other, allows the adjacency constraints to be easily satisfied, whereas the 
proximity constraints can be approximated by finding out whether two rooms are linked 
(by following adjacencies) and by how many rooms. This method allows for a 
compromise between excessive data gathering and constraint satisfaction and is used by 
the algorithms discussed in this paper. 



2.3 Evaluation of the space allocation process 

In order to ascertain the quality of a space allocation solution, a measure of the overall 
resource allocation, space utilisation and constraint satisfaction is needed. The 
following equation represents a generalised penalty function for any algorithmic 
method: 



NoOfRooms-\ 

Penalty = Re sourcesUnscheduled + ^ {^astage{i) + SpacePenalties(i)) 

i=0 

NoOf Re sources-l NoOf Re sources-l 

+ ^ ^ Re sourceConflicts(x, y) 

,v=0 >’=0 

Applying weights allows certain constraints to be considered to be more important than 
others and therefore have differing penalties associated with them. Table 1 shows the 
weighting functions used by the algorithms, for each of the sections of the equation 
above. Each constraint has an exponent and factorial weighting allowing greater 
versatility in applying penalties. An exponent of one represents a consistent penalty, 
i.e. each resource that is unscheduled increases the penalty by 5000. An exponent of 
greater than one represents an increasing exponential weighting depending on the size 
of the violation, i.e. exceeding room capacity by 2.0m^ increases the penalty by 4, 
exceeding by 15. Om^ increases the penalty by 225. 
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Constraint 


Exponential 


Factor 


Resources Unscheduled - resources not allocated to rooms 


1.0 


5000.0 


Space Wastage (per m^) - not using full capacity of room 


1.0 


2.0 


Space Penalties (per m^) - exceeding room capacity 


2.0 


2.0 


Sharing Violations - resources sharing when not allowed 


1.0 


2000.0 


Grouping Violations - members of different groups sharing 


1.0 


1000.0 


Adjacency Requirements - resources requiring adjacent placing 


1.0 


500.0 


Grouping Requirements - resources requiring same/adjacent room placing 


1.2 


50.0 


Proximity Requirements - resources requiring proximity placing 


1.0 


750.0 



Table 1. Constraint types and penalties used in all algorithms 



3. Three Methods for Automating the Problem 

The methods analysed in this paper have heen run using real allocation data from the 
School of Computer Science at the University of Nottingham. This school is moving 
to new premises next year due to expansion and serious space limitation of the current 
building, therefore this test data gives a true example of a difficult, highly constrained 
problem. The data consists of 83 resources, 52 rooms with 69 specific constraints. 
The 83 Resources consist of 1 Lecture Room, 4 Laboratories, 2 Meeting Rooms, 6 
Storage Rooms, 3 Professors, 4 Senior Lecturers, 1 1 Lecturers, 3 Teaching Assistants, 
6 Technical Staff, 8 Secretaries and 35 Researchers. The constraints consisted of 42 
Resources requiring sole occupancy of a room, 7 Research groups unable to share with 
other groups and requiring same or adjacent rooms, 3 Secretaries needing to be 
adjacent/close to a manager, 3 Technical staff needing to be adjacent/proximity to 
laboratories/workshops and 7 Group supervisors needing to be adjacent/close to 
research groups. 



3.1 Hill-Climbing 

The hill-climbing algorithm consisted of three functions: Allocate Resource, Move 
Resource and Swap Rooms. Allocate resource took an unallocated resource and 
allocated it to a room using the appropriate fit method. Move resource took an already 
allocated resource and reapplied the fit method to find another room. Swap rooms took 
a room and swapped all the resources in that room with the resources in another. All 
functions chose a random source unit to work on and applied one of two fit methods to 
choose the target unit. The first fit method used random selection of rooms (random 
fit), the other chose the room with the greatest reduction in penalty (best fit). 

Algorithm: Hill Climbing 

1 . Evaluate Current Allocation 

2. Loop until the current allocation has not improved in n iterations 

a) Select one of Allocate Resource, Move Resource or Swap Rooms 
in that order and apply it to produce a new allocation 

b) Evaluate the new allocation 

I) if it is better then make it the current allocation 
ii) It if is not better, continue 
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Table 2 shows the results from 20 runs of hoth variations of the hill-climbing 
algorithm, broken down into the main groups of penalties. The average entry is a 
calculation of the average of all 20 mns. 

Random fit managed an average utilisation of 76.5% whereas best fit managed 
78.5%. 



Constraints 


Random Fit 


Best Fit 


Worst 


Ave 


Best 


Worst 


Ave 


Best 


Space Wastage (per m^) 


997.8 


929.4 


508.4 


553.2 


540.6 


512.2 


Space Penalties (per m^) 


5225.5 


3502.4 


177.9 


932.5 


738.9 


184.2 


Resource Penalties 


10272.7 


5544.12 


500.0 


8500.0 


4700.0 


0.0 


Unscheduled Resources 


10000.0 


5000.0 


0.0 


0.0 


0.0 


0.0 


Approx. Time Taken 


1 minute 20 seconds 


1 minute 6 seconds 


Total Penalty 


26496.0 1 15538.4 | 1186.3 


9985.7 1 5979.6 | 696.4 



Table 2. Results from Hill Climbing algorithm tests 



3.2 Simulated Annealing 

Simulated Annealing is an extension to hill climbing, reducing the chance of 
converging at local optima hy allowing moves to inferior solutions under the control of 
a temperature function exp(-A/T) > R^, where A = the change in the evaluation 
function, T = the current temperature and R = a random number between 0 and 1 

The initial temperature value and the rate of cooling affects how the simulated 
annealing algorithm performs. Through extensive tests, the combination of a 2200 
initial temperature, a 300-iteration interval with a 100 decrement performed best. 

The simulated annealing tests managed an average utilisation figure of 78.9%. 



Constraint 


2200 temp /300 interval 


Worst 


Ave 


Best 


Space Wastage (per m^) 


555.2 


507.9 


475.8 


Space Penalties (per m^) 


695.56 


281.7 


0 


Resource Penalties 


2539.14 


1307.6 


0 


Unscheduled Resources 


0 


0 


0 


Approx. Time Taken 


7 minutes 46 seconds 


Total Penalty 


3777.3 1 2097.2 | 475.8 



Table 3. Results from Simulated Annealing algorithm tests 



3.3 A simple Genetic Algorithm 

Genetic algorithms (GA’s) use progressive generations of potential solutions and 
through Selection, Crossover and Mutation, aim to evolve solutions to the problem 
through the principles of evolutionary survival of the fittest. 

The GA in this paper consisted of a data encoding structured so that each gene 
represents a room, with a linked list of all the resources allocated to that room. 
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Fig. 2. Graphical representation of Genetic Algorithm encoding 



The Roulette-Wheel method was used in the Selection process and the Mutation 
operator simply moved a resource from one room to another. However, the Crossover 
operator required more consideration as the standard methods frequently result in 
invalid solutions, i.e. a resource being allocated to two rooms. The method 
implemented into the GA involved checking each room in the parents and where both 
parents had the same resource-to-room allocation, copy that to the child. Otherwise, 
take a resource-to-room mapping from one parent, as long as that resource has not 
already been allocated. 

The GA was tested with various population sizes and with various initial 
populations, the first of random room allocations, the second using the best fit hill 
climbing algorithm and the last using the simulated annealing (S.A.) algorithm. Table 
4 summarises the results for a population size of 50, using elitism to ensure the best 
result so far is not lost. 



Constraints 


Best in Initial Population 


Best Individnal after GA run 


Random 


Best Fit 


S.A. 


Random 


Best Fit 


S.A. 


Space Wastage (per m^) 


997.8 


725.6 


536.0 


733.2 


725.6 


536.0 


Space Penalties (per m^) 


3265.7 


254.6 


234.2 


1532.5 


254.6 


234.2 


Resource Penalties 


40251.3 


1265.5 


500.0 


7565.3 


1265.5 


500.0 


Unscheduled Resources 


15000.0 


0.0 


0.0 


0.0 


0.0 


0.0 


Total Penalty 


59514.8 


2245.7 


1270.2 


9831.0 


2245.7 


1270.2 



Table 4. Results from Genetic Algorithm tests 



4. Conclusions 

The analysis of the figures for the space utilisation layers throughout the tests shows 
exactly how the different requirements and constraints of the space allocation problem 
conflict and work against each other. The more highly constrained a problem is, the 
less likely it is to ensure an acceptable level of utilisation. The methods analysed show 
that the automation of this process can help balance utilisation and constraint 
satisfaction. 

Applying a polynomial-time approximation scheme bin packing algorithm® on the 
problem showed how the additional constraints affect the space utilisation. Removing 
all the constraints except the space wastage and space penalty constraints, allowed the 
binpacking algorithm to obtain 97% utilisation. Applying the sharing constraints 
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reduced this to 82.5%. This related to the other three methods (taking into account the 
constraints), managed around 76-79%. 

On the full space allocation problem, the three methods all performed in a 
reasonably acceptable manner. The Simulated Annealing algorithm performed the best 
with a minimum penalty of 475.8 allocating all 83 resources with zero room capacity 
penalties and zero resource penalties, and a small variation between worst and best. 
Random Fit Hill Climbing performed the worst with an approximately 25309.7 penalty 
variation between best and worst. In this case, 10000 was the penalty for being unable 
to allocate two of the 83 resources. Best Fit Hill Climbing performs better with a 
9289.3 variation, never failing to allocate all 83 resources. However, Best Fit still has 
problems allocating all those resources without exceeding room capacities. These 
results, specifically the variations in best to worst, emphasise the benefit of simulated 
annealing over hill-climbing methods which regularly get stuck in local optima. 

The results however, are offset by the amount of time taken by each method. The 
hill climbing methods took around 1 minute to finish, whereas the simulated annealing 
method took nearer 8 minutes. Improvements on the simulated annealing algorithm 
can be obtained by halving the cooling interval to 150. This results in a negligible 
increase in the average result (-H839.7) and an approx. 50% reduction in time taken 

The genetic algorithm managed to obtain reasonable results from the random 
initialisation data, however, it failed to improve on the hill-climbing and simulated 
annealing initialised populations. Variations on the operators used may produce more 
productive results, specifically the crossover operator. It may prove effective to rely 
completely on the mutation operator or a combination of mutation operators, as 
crossover consistently required more work to ensure legal solutions are produced. 

Further work is required to analyse potential improvements from further testing, 
specifically the possible hybridisation of the three methods with each other and with 
other methods such as Tabu-Search. 
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Abstract. K-nearest neighbour (KNN) algorithm in combination with 
a genetic algorithm were applied to a medical fraud detection problem. 
The genetic algorithm was used to determine the optimal weighting of 
the features used to classify General Practitioners’ (GP) practice profiles. 
The weights were used in the KNN algorithm to identify the nearest 
neighbour practice profiles and then two rules (i.e. the majority rule and 
the Bayesian rule) were applied to determine the classifications of the 
practice profiles. The results indicate that this classihcation methodology 
achieved good generalisation in classifying GP practice proHles in a test 
dataset. This opens the way towards its application in the medical fraud 
detection at Health Insurance Commission (HIC). 



1 Introduction 

The Health Insurance Commission (HIC) of Australia is responsible for admin- 
istering the Medicare Program for the Federal Government. Medicare provides 
basic medical cover for all Australian citizens and residents and in Financial Year 
1995/96 it dollar paid benefits of 6.014 billion. The HIC has a responsibility to 
protect the public purse and ensure that taxpayers’ funds are spent wisely on 
health care. 

The HIC uses a number of supervised-learning systems to classify the prac- 
tice profiles of practitioners who participate in Medicare to help identify those 
who are practising inappropriately and those who are involved in fraudulent 
practice. Inappropriate practice includes those who are over-servicing their pa- 
tients by performing more services than is necessary for their medical condition 
or who see their patients more often than is warranted. Fraudulent practice in- 
cludes claiming for services not performed or mis-itemising services to attract a 
higher benefit. An example is up-coding where a practitioner charged for a long 
consultation when a short consultation was conducted with a patient. 

One or more expert consultants, who are pre-eminent in the speciality, such 
as GP, are used to identify features, or indicators, which discriminate between 
good and bad practice in the speciality for which supervised-learning system 
is developed. Once the features are selected, the consultants then classify the 
practice profiles of a sample of practitioners from the speciality using a risk 
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classification scale ranging from a high risk profile to a low risk one. The classified 
sample is then used to train the supervised-learning system. 

One challenge the HIC faces is to achieve a high level of unanimity between 
the classifications given by the supervised-learning system and those given by 
the one or more expert consultants. This is necessary to ensure that the system 
emulates the judgements of human experts by learning the classified patterns 
in the training data set. The results of the supervised-learning system are likely 
to be ignored if they tend to be inconsistent with those the expert. The HIC 
has tested different supervised-learning techniques IB] for classifying prac- 
titioners’ practice profiles to see which ones give the highest agreement rates 
between machine-based judgements and those of human experts. The results of 
these studies revealed that no one supervised-learning system (e.g. rule based 
versus backpropagation neural network) was clearly superior to the others. One 
method that can be used to improve the classifications given by various super- 
vised learning techniques is the genetic algorithm. Genetic algorithms are ideal 
for optimisation problems and can be applied to improve the matches between 
the classifications of a system and those of human experts. The aim of this 
research was conducted to see to what extent using a genetic algorithm to opti- 
mise the weights of features improves the classifications of a supervised-learning 
system above that obtained from using equally weighted features. 

Genetic algorithms have been used in conjunction with other supervised- 
learning methods j6|,S|5| . In this study a genetic algorithm is applied to improve 
the classifications obtained using a K-nearest neighbour (KNN) algorithm. The 
KNN was selected for this study because it is a widely used profile-matching 
technique and it bases its classifications of each case on those of its nearest 
neighbours using different decision rules. Two rules are examined with this study 
and they include the majority rule and the Bayesian rule. These rules are tested 
to see what effect they have on the classifications of cases, while the genetic 
algorithm is applied to find the optimal, or near, optimal weights for each features 
using the distance metric employed in this study. 

A sample of GP’s practice profiles is used in the research. GPs are responsible 
for the primary medical care of patients and account for two-thirds of the general 
population of medical practitioners in Australia. The aim of this paper is to 
report the results obtained using the KNN algorithm in combination with a 
genetic algorithm to classify GP practice profiles. 

2 Methodology 

Each GP practice profile contains a number of features which summarises aspects 
of a GP’s practice over a year. An example is the total number of medical services 
performed in a year. As described previously, the features were selected by one or 
more expert consultants based on ability of the features to discriminate between 
good and bad GP practice. For legal and professional reasons it is not possible to 
list the features used to identify practitioners who are practising inappropriately. 
There were 28 features used in the GPs’ practice profiles reported in this study. 
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Each GP profile was given a risk classification of either 1 or 2 by the consul- 
tants with ”1” signifying a high-risk profile and ”2” a low-risk profile. A sample 
of 1500 GP profiles from a HIG database was selected and divided into three 
data sets including a training set, a validation set and a test set. The profiles in 
the training set were used to provide nearest neighbour examples to train the 
KNN classifier. The validation set was used to optimise the weight values and 
the test set was used to test the generalisation of the trained KNN classifier. 
The training set consisted of 738 profiles, validation set 380 profiles and test set 
382 profiles. 

The statistic used to gauge the effectiveness of the KNN classifier and variants 
was the agreement rate which is simply the percentage agreement between the 
classifications of the expert consultants and classifications of the KNN classifier 
divided by the total number of cases in the dataset. 

3 K-Nearest Neighbour Classification Technique 

The K Nearest Neighbour classification |T]2] of a sample is made on the basis 
of the classifications of the selected number of k neighbours. The following two 
methods were used in this study to decide the classification of nearest neigh- 
bouring GP practice profiles using the KNN algorithm: 



3.1 Majority Rule: 

The classification of the nearest neighbours was decided by the number of class 
1 ( ni ) compared to the number of class 2 ( ri 2 ) of all k nearest neighbours. If 
n\ > ri 2 then the classification was 1 and visa versa for situations where ri 2 > n\. 
To avoid situations where rii = ri 2 , the value k was selected as an odd number. 



3.2 Bayesian Rule: 

The classification of a sample was based on the Bayes rule. With this approach, 
a normal probability distribution function was applied in the neighbourhood of 
each nearest neighbours whose identification is assigned: 

= ( 1 ) 

Pi(x,,x) = 0 

Pl{xi,x) = 0 



if sample at Xi is class 1 
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if sample at Xi is class 2, where P^{xi,x) is the probability being class k 
at position x given classification at site i. The x and Xi are position vectors in 
multi-dimensional space. The d{x,Xi) is the squared weighted distance between 
two positions and is calculated as follows: 



d{x,y) = w‘^{xj - yjf (2) 

i=i 

where n is the number of features, Wj is the weight of the jth feature. The 
probability of being class 1 or class 2 at site x given the known k nearest neigh- 
bours’ classification is as follows: 



' ' Mi{x) + M2{x) 
2 ^ 

^ ’ Mi{x) + M2{x) 

where Mi{x) and M 2 {x) are defined as follows: 



(3) 



K 

Mi{x) = PiYPiixi,x) (4) 

i=l 

K 

M2{x) = P2YPii^i^^) 

i=l 

where Pi and P 2 are the probability of being class 1 and 2 respectively. 

Because of the number of features used, intuitively it can be seen that the 
importance of the features cannot be the same and therefore it is inappropriate 
to use Euclidean or other distance measures which give the equal weighting 
to all features. Therefore different weights were applied to the features in the 
distance equation (equation 2) and the optimal values were derived using a 
genetic algorithm. 



4 Genetic algorithm 

The genetic algorithm developed by John Holland and associates [3] at the Uni- 
versity of Michigan is a search algorithm based on the mechanics of natural 
selection. The algorithm is used for searching for the optimal, or near optimal, 
solution in a multi-dimensional space using Darwinian natural selection. In this 
study the genetic algorithm was used in the following manner: 
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4.1 Selection 



At each iteration, two individuals in the population were selected to produce two 
offspring for the next generation. In what is called the ’steady state’ approach 
to generating offspring, the population was kept static with most individuals 
retained in the next generation and only two individuals were replaced by two 
created offspring. The two new individuals were created through crossovers and 
mutations from two parent individuals. Crossovers and mutations used in this 
study are explained later below in subsections b and c. The two individuals that 
were replaced were the least optimal ones in the total population. 

The selection of the parent individuals was random with the optimal ones 
having a higher probability of being selected than the less optimal ones. To do 
this, the whole population of offspring was ranked in ascending order in terms of 
their cost function value. The derivation of the cost function is explained later 
below in subsection d. A geometric series was created with common factor q. 
With total population N, the series was as follows: 



a,aq,aq^, ....aq^ < 1) where a 



1-q 

1-q^ 



( 5 ) 



The probability of selecting the most optimal individual is a, the second is 
aq and the third aq2 and similar. This selection procedure favoured the more 
optimal strings being selected for reproduction. 



4.2 Crossover 

In the crossover, the two new individuals were formed using both parents which 
had been selected in the selection process. The nl weight values from one indi- 
vidual father individual) and n - nl from another individual ( mother individual 
) were selected. The nl was chosen randomly. The crossover procedure ensured 
that the new individuals took some values of weights from the father individual 
and some from the mother individual. 



4.3 Mutation 

After two offspring were formed, the small changes in values of selected param- 
eters were added or subtracted. Weight value for each feature had a certain 
probability of being changed. The extend of the change x was decided by a ran- 
dom number which had a normal probability distribution as shown in equation 
6 with a mean value fj, and deviation a 






P{x) = e 



( 6 ) 
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4.4 Cost Function 

The cost function was defined as the number of mis-classified cases Nmis plus a 
regularisation term which is used to avoid the problem of inflation of the weights. 
The cost function is shown in equation 7: 



F = Njnis + (7) 

i=l 

where the constant a is the regularisation coefficient, wi ( i = 1, 2, ...n) are 
the weights for all n features. 

The data sets are normalised to be between 0.0 and 1.0 so that all features 
used will follow to the same range. The parameters used in the genetic algorithm 
and k nearest neighbour are as follows: 

The ratio of geometric sequences used for selection of parents in equation (5) 
is q = 0.8. Values m and s in normal distribution for mutation in equation (6) 
are 0.2 and 1 respectively, and the mutation probability is 0.5. 

The values m and s for normal distribution used in Bayesian rule in equation 
(1) are 0.0 and 0.5 respectively. 



5 Results 

The results of using a genetic algorithm combined with the KNN for classifying 
general practitioners’ practice profiles are listed in the table 1. The results are 
the average over 50 runs where each run terminate at the 2000th generation. The 
last column lists the agreement rate using Euclidean distance with all weights 
are equal to 1. The common difference for the series is taken as three. The graph 
in Figure 1 depicts the cost reduction process for a run using k = 3 and Bayes’ 
rule in making the decision. 



Number 

nearest 

neighbour 


Regularisation 

Coefficient 


Decision Rule 


Agreement 

Rate 

Validation 


Agreement 

Rate 

Test 

Dataset 


Agreement 

Rate 

Test Dataset 
Euclidean Distance 


1 


0 


Nearest Neighbour 


83.16 


76.96 


69.1 


3 


0 


Majority Rule 


83.68 


73.82 


69.37 


3 


0.1 


Majority Rule 


83.95 


78.8 


69.37 


3 


0.2 


Majority Rule 


83.16 


76.44 


69.37 


3 


0.3 


Majority Rule 


83.95 


76.18 


69.37 


3 


0.1 


Bayes Rule 


82.63 


75.39 


70.68 


3 


0.1 


Bayes Rule 


83.16 


78.27 


70.68 


5 


0.2 


Majority Rule 


82.63 


77.49 


71.21 



Table 1. The results of a series of runs using genetic algorithm and KNN. 
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The results shown in Table 1 an Figure 1 indicate that: 

The KNN using the weights optimised by using a genetic algorithm improves 
the classification results using the Euclidean distance. 

High agreement rates were obtained using the KNN with both the majority 
and the Bayes’ rules for the validation dataset. The obtained agreement rates 
were in the range of 82 to 84 percent regardless of the number of nearest neigh- 
bours selected and regardless of the regularisation coefficient used. 

The genetic algorithm was very efficient in this application with only 2000 
generation needed to achieve desired agreement rate for the validation dataset. 

The agreement rates with the test dataset were in the range of 73 to 79 
percent with some variations in results for different classification rules. The best 
result of 78.8 percent agreement rate was obtained with the majority rule using 
three neighbours with regularisation. The worst result of 73.82 percent agreement 
rate was obtained with the majority rule without regularisation. These results 
are better than those obtained with the features having the equal weights. 



Cost Reduction 
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Fig. 1. An Example of the Cost Reduction as a Result of Training Using k = 3 and 
Bayes’ Rule. 



6 Discussion 

The results show that the genetic algorithm is very effective in finding a near 
optimal set of weights for the KNN classifier. The results also show that the 
addition of the regularisation term in the cost function helps to prevent large 
values being derived for variable weights, as shown by the test dataset results 
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which show good generalisation from the validation dataset using genetically 
trained weights with KNN. The other factors, such as the number of nearest 
neighbours and the classification rule, do not appear to be critical in improving 
the classification results. 

The agreement rates for the KNN used in combination with a genetic algo- 
rithm are comparable to those obtained from using a ripple-down rule and other 
approaches to classify GP practice profiles. [6] For example, the agreement rates 
for the KNN were in the range of 73 to 79 percent, while those for the ripple- 
down rules which is case-based classification system were in the range of 70-74 
percent. This suggests that KNN approach discussed in this paper is at least as 
good as the ripple-down rules for classifying GP practice profiles. 

7 Conclusions 

The results in this study indicate that KNN used in combination with a genetic 
algorithm achieved good generalisation with classifying GP practice profiles in 
a test dataset. This opens the way for its application in solving a real world 
problem namely medical fraud detection. A further refinement of the method 
and the tuning of its parameters are needed to make it a routine application in 
the Health Insurance Gommission. 
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Abstract. This paper proposes a genetic-algorithm-based approach for 
finding a compact reference set used in nearest neighbor classification. 
The reference set is designed by selecting a small number of reference 
patterns from a large number of training patterns using a genetic al- 
gorithm. The genetic algorithm also removes unnecessary features. The 
reference set in our nearest neighbor classification consists of selected 
patterns with selected features. A binary string is used for representing 
the inclusion (or exclusion) of each pattern and feature in the reference 
set. Our goal is to minimize the number of selected patterns, to mini- 
mize the number of selected features, and to maximize the classification 
performance of the reference set. The effectiveness of our approach is 
examined by computer simulations on commonly used data sets. 

Key words: Genetic algorithms, pattern classification, nearest neighbor 
classification, combinatorial optimization, multi-objective optimization, 
knowledge discovery. 



1 Introduction 

Nearest neighbor classification (Cover and Hart [1]) is one of the most well- 
known classification methods in the literature. In its standard formulation, all 
training patterns are used as reference patterns for classifying new patterns. 
Various approaches were proposed for finding a compact set of reference patterns 
used in nearest neighbor classification (for example, see Hart [2], Wilson [3], 
Dasarathy [4] , Chaudhuri [5] ) . In those approaches, a small number of reference 
patterns were selected from a large number of training patterns. Recently genetic 
algorithms were employed in Kuncheva [6,7] for finding a compact reference set 
in nearest neighbor classification. Genetic algorithms were also used for selecting 
important features in Siedlecki and Sklansky [8] and for weighting each feature 
in Kelly and Davis [9] and Punch et al.[10j. In Knight and Sen [II], genetic 
algorithms were used for generating prototypes. 

This paper proposes a genetic-algorithm-based approach for simultaneously 
selecting reference patterns and important features. Let us assume that m train- 
ing patterns with n features are given in an n-dimensional pattern space. Our 
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problem is to design a compact reference set by selecting a small number of ref- 
erence patterns from the given m training patterns and removing unnecessary 
features from the given n features. In this paper, we first formulate our pattern 
and feature selection problem as a multi-objective combinatorial optimization 
problem with three objectives: to minimize the number of selected patterns, to 
minimize the number of selected features, and to maximize the number of cor- 
rectly classified training patterns by the reference set. Next we explain how a 
genetic algorithm can be applied to our pattern and feature selection problem. 
In the genetic algorithm, a reference set is coded by a binary string of the length 
(n -|- m) . Each bit value of the first n bits represents the inclusion or exclusion of 
the corresponding feature. The other m bit values represent the inclusion or ex- 
clusion of each of the given m training patterns. A fitness value of each reference 
set (i.e., binary string) is defined by the weighted sum of the three objectives. 
Finally we examine the effectiveness of our approach by computer simulations 
on commonly used data sets. Simulation results show that a small number of 
training patterns are selected by our approach together with a few important 
features. 



2 Problem Formulation 

In general, the main goal of pattern classification methods such as statistical 
techniques, machine learning and neural networks is to maximize the prediction 
ability for unseen patterns. Thus their performance is usually measured by error 
rates on unseen patterns. When pattern classification methods are used in the 
context of decision support, knowledge discovery and data mining, it is required 
to present understandable classification knowledge to human users. While the 
classification mechanism of nearest neighbor classification is easily understood 
by human users, the understandability of classification knowledge is not high 
because a large number of training patterns are stored as classification knowl- 
edge. Our goal in this paper is to extract a small number of reference patterns 
together with important features in order to show classification knowledge to 
human user in an understandable form. 

We assume that m training patterns Xp = (a;pi, a;p 2 , ■ • ■ , Xpn),p = 1,2, ... ,m 
with n features are given in an n-dimensional pattern space where Xpi is the 
attribute value of the f-th feature in the p-th pattern. Let Pall be the set of 
the given m training patterns: Pall = {xi, X 2 , . . . , x™}. We also denote the 
set of the given n features as Pall = {fi, l 2 , ■ ■ ■ , fn} where fi is the label of 
the i-th feature. Our problem is to select a small number of reference patterns 
from Pall and to select only important features from Pall- Let P and P be 
the set of selected features and the set of selected patterns, respectively, where 
P C Pall and P C Pall- We denote the reference set by S. Since S is uniquely 
specified by the feature set P and the pattern set P, the reference set is denoted 
as S' = (P, P). In the standard formulation of nearest neighbor classification, 
the reference set is defined as S = (Pall, Pall) because all the given features 
and patterns are used for classifying a new pattern x = (xi,X 2 , - - - , Xn). In our 
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nearest neighbor classification with the reference set S = {F,P), the nearest 
neighbor Xp to the new pattern x is found from the pattern set P as 

d(xp,x) = min{fi(xp,x) | Xp G P}, (1) 

where d(xp,x) is the distance between Xp and x, which is measured based on 
the feature set F as 

d(xp,x)= /^(xp, . (2) 

Y i F 

If the class of the nearest neighbor Xp is the same as that of the new pattern x, 
X is correctly classified. Otherwise the new pattern x is misclassified. 

Since our goal is to find a compact reference set that is easily understood 
by human users, the number of selected patterns and the number of selected 
features are minimized. The maximization of the classification performance is 
also considered in the design of the reference set. Thus our pattern and feature 
selection problem is written as follows: 

Minimize jfl, minimize |P|,and maximize NCP{S), (3) 

where |F| is the number of features in F (i.e., the cardinality of F), |P| is the 
number of patterns in P, and NCP{S) is the number of correctly classified 
training patterns by the reference set S = {F,P). In (3), NCP{S) is calculated 
by classifying all the given m training patterns by the reference set S = (F, P) . 



3 Genetic Algorithms 

For applying a genetic algorithm to our pattern and feature selection problem, 
a reference set S = {F, P) is coded as a binary string with the length (n + to): 

5' = aia2---a„SiS2---Sm, (4) 

where the first n bits denote the inclusion or exclusion of each of the n features, 
and the other to bits denote the inclusion or exclusion of each of the to patterns. 
The feature set F and the pattern set P are obtained by decoding the string S 
as follows: 

F = {fi\a, = l, i = (5) 

P = {xp I Sp = 1, p= 1,2 ,..., to}. (6) 

The fitness value of the binary string S = ai 02 • • • a„siS 2 • • • Sm (i-e., the 
reference set S = {F, P)) is defined by the three objectives of our pattern and 
feature selection problem in (3). We use the following weighted sum as a fitness 
function: 



fitness(S) = Wncp ■ NCP{S) - Wp ■ |F| - Wp ■ \P\ (7) 



where Wncp, Wp and Wp are non-negative weights. 
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Since the three objectives are combined into the above scalar fitness function, 
we can apply a single-objective genetic algorithm to our pattern and feature 
selection problem (We can also handle our problem by multi-objective genetic 
algorithms [12] without introducing the scalar fitness function). In our genetic 
algorithm, first a number of binary strings of the length (n -|- m) are randomly 
generated to form an initial population. Let us denote the population size by 
.^pop- Next a pair of strings are randomly selected from the current population. 
Two strings are generated from the selected pair of strings by crossover and 
mutation. The selection, crossover and mutation are iterated to generate fVpop 
strings. The newly generated A^pop strings are added to the current population 
to form the enlarged population of the size 2 • fVpop. The next population is 
constructed by selecting the best A^pop strings from the enlarged population. 
The population update is iterated until a pre-specified stopping condition is 
satisfied. The outline of our genetic algorithm is similar to Kuncheva’s algorithm 
(Kuncheva [6,7]). 

In our genetic algorithm, we use the uniform crossover to avoid the depen- 
dency of the performance on the order of the n features and the m patterns in 
the string. For efficiently decreasing the number of reference patterns, we use 
the biased mutation (Nakashima and Ishibuchi [13]) where a larger probabil- 
ity is assigned to the mutation from “sp = 1” to “sp = 0” than the mutation 
from “sp = 0” to “Sp = 1” . That is, we use two different mutation probabilities 
Pm(l — *■ 0) and Pm(0 — > 1) for the last m bits of the string, each of which repre- 
sents the inclusion or exclusion of the corresponding pattern in the reference set. 
Since Pm(0 — > 1) < Pm(l — *■ 0), the number of reference patterns is efficiently 
decreased by the biased mutation during the execution of our genetic algorithm. 
The biased mutation is the main characteristic feature of our genetic algorithm. 
We use the biased mutation because the number of selected reference patterns 
is to be much smaller than that of the given patterns (e.g., 1/20 ~ 1/40 of the 
given patterns in computer simulations of this paper) . It has been demonstrated 
in Nakashima and Ishibuchi [13] that the number of reference patterns could 
not be efficiently decreased without the biased mutation. Note that we use the 
standard unbiased mutation for the first n bits of the string, each of which rep- 
resents the inclusion or exclusion of the corresponding feature. This is because 
usually the number of given features is not as large as that of given patterns. In 
this case, the biased mutation is not necessary for the feature selection. 



4 Computer Simulations 

In this section, we first illustrate the pattern selection by a two-dimensional 
pattern classification problem. Next we illustrate the pattern and feature selec- 
tion by the well-known iris data. The iris data set is a three-class classification 
problem involving 150 patterns with four features. Then the applicability of 
our approach to high-dimensional problems is examined by wine data. The wine 
data set is a three-class classification problem involving 178 patterns with 13 fea- 
tures. Finally the performance of our approach on large data sets is examined by 




86 



Hisao Ishibuchi and Tomoharu Nakashima 



Australian credit approval data. The credit data set is a two-class classification 
problem involving 690 patterns with 14 attributes. We use the iris data, wine 
data and credit data because they are available from the UC Irvine Database 
(via anonymous FTP from ftp.ics.uci.edu in directory /pub /machine-learning- 
databases). In our computer simulations, all the attribute values were normalized 
into the unit interval [0,1] before applying the genetic algorithm to each data 
set for the pattern and feature selection. 



4.1 Computer Simulation on a Numerical Example 

Let us illustrate our approach by a simple numerical example in Fig. 1 (a) where 
200 training patterns are given from two classes. In Fig. 1 (a), we also show the 
classification boundary by the nearest neighbor classification based on all the 
given training patterns. 





1 .V vy.vy 

X\ Xi 



(a) 



(b) 



Fig. 1. Simulation result for a numerical example, (a) Nearest neighbor classification by 
all the given patterns, (b) Nearest neighbor classification by selected reference patterns. 

We applied our approach with the following parameter specifications to the 
two-dimensional pattern classification problem in Fig. 1 (a) with 200 training 
patterns. 

String length: 202 (2 features and 200 training patterns). 

Population size: Npop = 50, 

Crossover probability: 1.0, 

Mutation probabilities: 0.01 for features, 

Pm(l ^ 0) = 0.1 and Pm(0 ^ 1) = 0-01 for patterns. 
Weight values: Wncp = 5, Wp = 1, Wp = 1, 

Stopping condition: 500 generations. 
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After 500 generations, eight patterns were selected by the genetic algorithm. 
Both the given two features were also selected. In Fig. 1 (b), we show the clas- 
sification boundary by the nearest neighbor classification based on the selected 
eight patterns. From this simulation result, we can see that a small number of 
reference patterns were selected by the genetic algorithm from a large number 
of training patterns. 



4.2 Computer Simulation on Iris Data 

In the same manner as in the previous subsection, we applied our approach to 
the iris data. Since the iris data have four attributes and 150 training patterns, 
the string length was specified as 154. The computer simulation was iterated 
10 times using different initial populations. The following average result was 
obtained: 

Average number of selected features: 2.1. 

Average number of selected patterns: 6.3. 

Average classification rate on training patterns: 99.3%. 

From this result, we can see that compact reference sets were obtained by the 
genetic algorithm. For example, six reference patterns with two features (the 
third and fourth features) were selected in nine of the ten trials. An example of 
the selected reference sets is shown in Fig. 2 in the reduced pattern space with 
the selected features and X4. We also show the classification boundary by the 
nearest neighbor classification based on the selected patterns and features. This 
reference set can correctly classify 149 training patterns (99.3% of the given 150 
training patterns). For the iris data, it is well-known that the third and fourth 
features are important. These two attributes were always selected in the ten 
independent trials. 



• Class 1 O Class 2 x Class 3 




Fig. 2. Selected patterns and features for the iris data. 
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4.3 Computer Simulation on Wine Data 

We also applied our approach to the wine data in order to demonstrate its 
applicability to high-dimensional classification problems. Computer simulation 
was performed in the same manner as in the previous subsections. Since the 
wine data have 178 patterns with 13 features, the string length was specified 
as 191. The computer simulation was iterated 10 times using different initial 
populations. The following average result was obtained: 

Average number of selected features: 6.9. 

Average number of selected patterns: 5.4. 

Average classification rate on training patterns: 100%. 

From this result, we can see that compact reference sets were obtained by the 
genetic algorithm for the wine data with many features. 



4.4 Computer Simulation on Credit Data 

We also applied our approach to the credit data in order to demonstrate its 
applicability to large data sets with many training patterns. Since the credit 
data have 690 patterns with 14 features, the string length was specified as 704. 
Much longer strings were used for the credit data than the iris data and the wine 
data. This means that the search space of the genetic algorithm for the credit 
data is much larger than the cases of the other data sets. In the application to 
the credit data, we used the following parameter specifications to handle such a 
large search space: 

Population size: A^pop = 50, 

Crossover probability: 1.0, 

Mutation probabilities: 0.01 for features, 

Pm(l — > 0) = 0.1 andpm(0 ^ 1) = 0.005 for patterns. 
Weight values: Wncp = 5, Wp = 1, Wp = 1, 

Stopping condition: 2000 generations. 

The computer simulation was iterated five times using different initial popula- 
tions. The following average result was obtained: 

Average number of selected features: 6.8. 

Average number of selected patterns: 38.6. 

Average classification rate on training patterns: 92.6%. 

Since there is a large overlap between two classes in the credit data, the average 
classification rate is smaller than the cases of the other data sets. The average 
classification rate can be increased by assigning a large value to the weight for 
the classification performance (i.e., Wmcp)- For example, we had a 94.2% aver- 
age classification rate by specifying the weight values as Wncp = 20, Wp = 1 
and Wp = 1. In this case, the number of selected patterns was increased from 
38.6 to 76.0. As we can see from these simulation results, there is a trade-off 
between the classification performance of the reference set and its compactness. 
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5 Conclusion 

In this paper, we proposed a genetic-algorithm-based approach to the design of 
compact reference sets in nearest neighbor classification. In our approach, feature 
selection and pattern selection are simultaneously performed by a genetic algo- 
rithm. That is, a small number of reference patterns with only important features 
are selected. The effectiveness of our approach was demonstrated by computer 
simulations on commonly used data sets. The number of selected patterns was 
1/20 ~ 1/40 of the given training patterns in our computer simulations of this 
paper. About half features were also removed in our computer simulations. 
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Abstract. The cellular genetic algorithm (CGA) combines GAs with 
cellular automata by spreading an evolving population across a pseudo- 
landscape. In this study we use insights from ecology to introduce new 
features, such as disasters and connectivity changes, into the algorithm. 
We investigate the performance and behaviour of the algorithm on stan- 
dard GA hard problems. The CGA has the advantage of avoiding prema- 
ture convergence and outperforms standard GAs on particular problems. 
A potentially important feature of the algorithm’s behaviour is that the 
fitness of solutions frequently improves in large jumps following distur- 
bances (culling by patches). 



1 Introduction 

Genetic algorithms (GAs) are search and optimization techniques that are based 
on the analogy of biological evolution a, i, |10| . One of the great attractions 
of this approach is that natural selection has succeeded in producing species 
that solve the problem of survival and are often highly optimized with respect 
to their environment. However the traditional GA approach is only a simplified 
version of what really occurs in nature. For instance genes and chromosomes are 
organized differently, and population dynamics in landscapes introduces added 
complexity. An important question, therefore, is whether algorithms that more 
closely mimic the evolutionary process convey any advantages over simple GAs. 
In this study we begin to address this question by investigating the performance 
and behaviour of a GA that embodies features of evolution in a landscape. 

Traditional GAs evolve a population of individuals over time by selecting 
mates from the entire population (either at random or based on some fitness 
measure). Loss of population diversity (convergence) reduces the quality of many 
solutions. Many ad hoc schemes have been introduced to continually change 
genetic parameters in order to preserve diversity m- 

Parallel genetic algorithms (PGAs) attempt to improve the performance of 
GAs by restricting mating to subpopulations of individuals j3], [H]. The spatial 
population structure employed by PGAs helps to maintain diversity in a more 
natural manner. Typically PGAs utilise static population structures that are 
specified at the beginning of the run and remain unchanged. Here we develop 
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an approach to PGAs in which changes in topology, brought about by varying 
the proportion of individuals alive in the population, play a crucial part in 
the operation of the algorithm. This approach is based on recent findings about 
evolution in a landscape 0 and builds on findings into the critical nature of 
connectivity 0, i and especially for cellular automata models of landscapes. 
The proposed cellular genetic algorithm (CGA) highlights the importance of a 
spatial population structure for the evolution of solutions to complex problems. 



2 Parallel Genetic Algorithms 

The PGA is a parallel search with information exchange between individuals 
within a spatially distributed population. There are two categories of PGAs; 
coarse or fine grained PGAs. In coarse-grained PGAs subpopulations are intro- 
duced that work independently with only occasional exchanges of individuals 
- migration. They are also known as “distibuted” GAs or “island” models [T]. 
In fine-grained PGAs the spatial distribution of the population is defined, and 
selection and crossover are restricted within the individual’s neighbourhood |8]. 

As a consequence of local selection in PGAs, there is less selection pressure 
and tendency towards more exploration of the search space. Gritical parameter 
settings include migration rate and interval, topology and the ratio of the radius 
of the neighbourhood to the radius of the whole grid |3]. 

Various PGAs have been proposed to tackle optimisation problems. Mand- 
erick and Spiessens [B| described a fine-grained PGA that uses a local selection 
scheme where a mate was randomly selected based on the local fitness values. 
The algorithm was implemented on a sequential machine, therefore it is not a 
truly parallel algorithm. Muhlenbein et al d] use local selection too, but it 
is limited to a small neighbourhood (only 4 neighbours). Muhlenbein also uses 
hill-climbing for each individual. In recent work by Rudolph and Sprave m, 
a self-adjusting threshold scheme was used to control the selection pressure. 
Branke et al [2] described a number of global selection schemes for PGAs, but 
they require global knowledge. Lin et al |7] introduces a hybrid PGA that in- 
corporates both coarse and finegrained PGAs for job shop scheduling problems. 
Yao [15] also describes global optimisation using parallel evolutionary algorithms 
as well as the possibility of using hybrid algorithms to improve performance. 



3 A Cellular Genetic Algorithm 

The algorithm we explore here embeds the evolving population of the GA in a 
cellular automaton (GA) . Gomputationally this is a fine-grained PGA, but with 
certain biologically inspired modifications. Whitley m introduced the term “cel- 
lular genetic algorithm” (GGA) for this sort of model. However one important 
difference in our approach is that the individuals only occupy cells in the grid; 
they are not identified with them. We treat the grid as a model of a pseudo- 
landscape, with each cell corresponding to a site in the landscape. At any given 
time each cell may be active (occupied) or inactive. 
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Fig. 1. Phase changes in connectivity within a genetic “landscape”. In each case the 
x-axis represents the proportion p of cells in the grid that are occupied, (a) Change in 
the size of the largest connected “patch” of cells. Note the phase change at p = 0.4. (b) 
Variance in the size of the patch in (a) over repeated trials. Note the extreme variance 
at the phase change, (c) Time for an epidemic process to traverse the largest patch. 



Our approach draws on ideas from population dynamics and landscape ecol- 
ogy. Its rationale derives from findings of our previous research on the nature 
of evolution in a landscape We have shown that the structure change of a 
cellular automaton model plays a critical role in many ecological changes and 
species evolution 0 0. For example, a phase change occurs in the connectivity 
of grids as the proportion of active cells changes (Fig. [T]). 

The above result has a crucial implication. Simply by changing the propor- 
tion of active cells within the model (so that the number crosses the connectiv- 
ity threshold), we can induce phase changes in genetic communication within a 
population. For example if we randomly assign a certain proportion of the pop- 
ulation as alive, and the remaining as dead, the population of living individuals 
fragments into many patchy subpopulations (Fig. El). Although the same fine- 
grained local interaction rules apply, the use of patchy subpopulations in this 
case resembles the coarse-grained PGA approach. This enables refinement of so- 
lutions within a subpopulation as well as the accumulation of variations between 
subpopulations. When individuals from different subpopulations spread out and 
mate, fitter hybrids often appear. 

The CGA uses a toroidal grid in which the state of each cell consists of 
a binary chromosome as well as a value indicating fitness. The length of the 
chromosome is problem dependent. Each generation consist of upgrading the 
status of each cell in a series of steps. These steps follow the basic pattern for 
GAs except that the spatial arrangement of cells modifies the process in the 
following important ways. 



1. On each generation each active cell produces an offspring to replace itself. 
In doing so it carries out crossover of its genetic material with one of its 
neighbours. A mate is selected based on its fitness value relative to the local 
neighbourhood . 

2. Varying proportions of individuals are removed at random using a disaster 
option. Normally the disaster is small, but occasionally large numbers are 
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Fig. 2. Patchy subpopulations of a PGA model. Active cells are shaded, while inactive 
cells are represented by white cells on the 2-dimensional grid. The black area shows 
the extent of the single largest patch of connected active cells in this system. The 
proportion of active cells in this case is set at the critical level (cf Fig. 



wiped out. Control parameters include the rate at which disasters strike a 
generation and the maximum radius of the disaster zone. 

3. Wherever a cell is cleared, the neighbouring cells compete to occupy it. Thus, 
it will take a number of generations to fill the vacant zone after a disaster. 



4 Experiments and Results 

Initial investigations focussed on three different selection schemes within the 
neighbourhood model: (a) a mate is selected at random; (b) a fitter mate is 
randomly selected; and (c) the fittest individual in the local neighbourhood is 
always selected as a mate. The size of the neighbourhood is set to one (eight 
nearest neighbours) in all models. 

For performance evaluation De Jong’s standard test functions FI, F2, F3 
and F5 were used [4]. F4 was excluded due to the stochastic nature of the 
function. The configurations described have been implemented in C* and run on 
a Connection Machine CM-5. The 2-D grid size was set to 10 (100 individuals). 
In all runs the crossover and mutation rates were 0.6 and 0.1 respectively. 

The CCA was able to find optimum solutions for all test functions. Table [Tfa) 
lists the number of generations to reach the optimum solution for each function 
for a given selection technique averaged over ten runs. For all test functions, 
there is a significant reduction in the number of evaluations required to find the 
optimal solution when the fittest selection technique is adopted. Table [11(b) lists 
performance statistics for each function, based on the fittest selection strategy 
after 100 generations. Here we compare the performance of the CCA with the 
Manderick and Spiessens model [H] in column FGl. There does not appear to be 
any significant improvement in performance for FI or F2. However, the results for 
F3 and F5 suggest that the CCA not only alleviates the premature convergence 
problem and improves results, but also finds solutions in a shorter time. 




94 



Michael Kirley, Xiaodong Li and David G. Green 



Table 1. CGA performance measures 



(a) Selection Technique (b) Comparison 



Function 


Selection 


Generation 


Function 


Performance 


GGA 


FGl 




Random 


- 




Online 


12.3358 


2.8333 


FI 


Fitter 


155.0 


FI 


Offline 


0.1659 


0.0192 




Fittest 


64.0 




Best-so-far 


0.0001 


0.0000 




Random 


136.8 




Online 


128.789 


47.7812 


F2 


Fitter 


131.5 


F2 


Offline 


0.1597 


0.3032 




Fittest 


39.8 




Best-so-far 


0.0901 


0.2573 




Random 


141.0 




Online 


8.3120 


22.1863 


F3 


Fitter 


120.5 


F3 


Offline 


3.8200 


11.0415 




Fittest 


40.3 




Best-so-far 


0.0000 


10.8500 




Random 


83.0 




Online 


111.071 


168.9408 


F5 


Fitter 


52.3 


F5 


Offline 


3.4192 


14.4214 




Fittest 


50.6 




Best-so-far 


1.0021 


9.2834 



Next we investigated the CGA performance using a GA-hard problem. The 
function used is a generalised version of the Rastrigin function. 

f{x) = nA + X]"=i Xi - Acos{2nXt) 

—5.12 < Xj < 5.12 and A = 3, n = 20 



This function is predominantly unimodal with an overlying lattice of moder- 
ately sized peaks and basins. The min f(x) = f(0, 0 0) = 0. 

The CGA introduces a disaster option, which allows the grid to be broken 
up into patches. The dynamic nature of connectivity allows interactions to be 
restricted to small neighbourhoods which means that good alternative solutions 
can persist by forming small clumps. All individuals can be considered to be 
continuously moving around within their neighbourhoods, so that global com- 
munication is possible, but not instantaneous. Wherever a cell is cleared, the 
neighbouring cells compete to occupy it. 

The 2-D grid size was set to 100. The large grid is needed to achieve a more 
realistic model. The parameter settings for the preliminary investigations in- 
clude the fittest selection strategy, with crossover and mutation rates 0.6 and 
0.1 respectively. The disaster rate and maximum disaster zone radius (the range 
or neighbourhood size of the disaster) values were systematically altered to de- 
termine the effects of spatial disturbances on solution quality. The results are 
summarised in Figs. |5]and 2] 

As the disaster configuration parameters are altered the ability of the GGA to 
find good solutions also changes. The results shown in Fig. |3]indicate that there 
is variation in performance when the spatial population is disturbed. The best 
results occur consistently with disaster rate in the range 0.1 - 0.2 and disaster 
zone radius value of 30. As expected disturbances of too great a severity destroys 
the variability of the population, and thus the basis of further advances. 
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Fig. 3. CGA average best fitness after 500 generations for the Rastrigin function using 
different combinations of disaster rates and disaster zone radius values. 




Fig. 4. A comparison of CGA performance. The best individual fitness value vs gen- 
eration for the CGA without disasters averaged over five runs compared to a typical 
run with disaster rate = 0.1 and disaster zone radius = 30. Disasters shown here as an 
arrow. 



If the disaster option is disabled the CGA functions as a typical fine-grained 
PGA. Other performance measures examined included the average and standard 
deviation of the solution fitness values at the end of the run as well as the time 
to reach the best solution. There does not appear to be significant variation, 
with or without disasters, based on these measures in the CGA. 

Fig. m shows the average best-ever results of the CGA with the disaster 
option turned off and an example with disaster rate = 0.1 and disaster zone 
radius = 30. Typically we have rapid progress at the beginning, followed by a 
more gradual improvement in solution quality. In the CGA with disaster trial, 
there is often an increase in solution quality following disasters. For example, 
the graph shows that a significant jump in fitness occurred at approximately 
generation 180 following a disaster. 
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5 Discussion 

To summarise the results briefly, the version of the CGA that we have intro- 
duced here works well on all the standard problems with which we have tested it 
with, especially De Jong’s function F5 and the Rastrigin function. Improvement 
in performance is attributable to the use of local neighbourhoods (i.e. the fine- 
grained PGA approach) which reduces the tendency of premature convergence. 
The introduction of disasters into the algorithm produces some interesting be- 
haviour. In particular we saw that there is a strong tendency for the best fitness 
in the population to increase in jumps following a disaster. This is something 
that we anticipated by analogy with biological systems. Our earlier work 0 , 0 
had shown that disasters in a landscape can lead to explosions of small popula- 
tions that were previously suppressed by their competitors. The jumps in fitness 
arise from hybridisation of these populations when they are able to spread and 
come into contact with each other. 

Lin et al 0 obtained best results for the job shop scheduling problem using a 
hybrid PGA model consisting of course-grained PGAs connected in a fine-grained 
PGA style topology. The GGA with disasters effectively achieves the same thing. 
That is the regular operation of the algorithm, with mate selection confined to a 
local neighbourhood is a straightforward fine-grained PGA. Disasters break up 
the grid into isolated patches that are still locally communicating but isolated 
from other patches - i.e. a coarse-grained PGA. The results support the concept 
that emulating biological processes more closely holds the potential to produce 
better algorithms. In this study we have tested the GGA only on standard test 
problems. However, its ideal application is likely to be problems that involve 
optima and criteria, which vary in time and space and correspond to the sorts 
of circumstances faced by natural populations. 
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Abstract. Unsupervised learning algorithms realizing topographic map- 
pings are justified by neurobiology while they are useful for multivariate 
data analysis. In contrast to supervised learning algorithms unsupervised 
neural networks have their objective function implicitly defined by the 
learning rule. When considering topographic mapping as an optimization 
problem, the presence of explicitly defined objective functions becomes 
essential. In this paper, we show that measures of neighborhood preser- 
vation can be used for optimizing and learning topographic mappings by 
means of evolution strategies. Numerical experiments reveal these mea- 
sures also being a possible description of the principles governing the 
learning process of unsupervised neural networks. We argue that quanti- 
fying neighborhood preservation provides a link for connecting evolution 
strategies and unsupervised neural learning algorithms for building hy- 
brid learning architectures. 



1 Introduction 

A mapping being topographic is able to transform neighboring points in some 
space into neighboring points in another space, while the neighborhood relation 
is retained by means of the transformation. There exist several unsupervised 
learning algorithms performing topographic mappings (some of which we will 
consider in section However, there is no generic framework subsuming the 
different learning dynamics. Identifying the principles underlying such a learning 
process becomes important when designing new learning schemes. 

We will investigate empirically if the quantification of neighborhood preser- 
vation is suitable as a black box description of neural learning dynamics by 
calculating these measures in parallel to the neural learning process. By using 
evolution strategies for optimizing the measures of neighborhood preservation, 
we will see if these measures can be considered as objective functions describing 
topographic mappings. 

2 Quantifying neighborhood preservation 

A recent work by Goodhill and Sejnowski [S] identifies several methods for quan- 
tifying neighborhood preservation in topographic mappings and provides a tax- 



X. Yao et al. (Eds.): SEAL’98, LNCS 1585, pp. SS- TTf)^ 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 



Quantifying Neighborhood Preservation 99 

onomy of these measures. The authors have also calculated the values assigned 
by these measures to a set of four static mappings (square to line problem) and 
have done some analysis on parameter variation. 

Here, we will make use of those non-stochastic measures named in [^, which 
do not need additional parameters or target values for a mapping and which 
do not yield binary relationship values. Therefore, we will consider the most 
universal measures. These are briefly explained in the following, where Xi {yi) 
names point i in input (output) space of the mapping, while N is the number of 
points mapped and (Z\^^^) is a function quantifying similarity of points in 
input (output) space. 



The C measure 

The cost functional C [1] measures the correlation between the distance in 
input space and the distance A^^^ in output space: 

N 

^ = EE A^^\xi,Xj)A^y\yi,yj) (1) 

i—1 j<.i 



Metric Multidimensional Scaling 

Metric multidimensional scaling (metric MDS) [Hj is used to match a given set of 
dissimilarities of measurements with the dissimilarities of points resulting from a 
transformation of the initial measurements. The objective function defining the 
quality of matching is the summed squared differences of dissimilarities: 

N 

mMDS = 

i=l j<i 



Sammon mappings 

Although similar to metric MDS, Sammon mappings |10| are nonlinear by the 
use of normalization. The Sammon function emphasizes differences involving 
small A^^'> values: 



N 



s = 






— A^y\y^,yj)Y 



j<i 



A(A(x^,Xj) 



Spearman coefficient 

The idea of metric topology preserving (MTP) maps presented in PQ is that 
it preserves the relative positions (ranks) of pairwise similarities among points. 



100 



Rail Garionis 



For calculating the deviation from an optimal MTP mapping the Spearman 
coefficient psp is used, which is the correlation coefficient of the ranks [^: 



|T] proved that a map is metric topology preserving if psp = 1. 

3 Self-organizing topographic mappings 

Unsupervised learning rules for topographic maps transform points in some con- 
tinuous input space into discrete lattice points in neuron space. Each neuron 
carries a reference (weight) vector mapping its lattice position back into input 
space. We will now briefly explain a selection of learning algorithms fitting in 
this framework. 

Kohonen’s self organizing maps 

The self-organizing feature-mapping algorithm developed by Kohonen j7] be- 
came a synonym for unsupervised learning algorithms providing topographic 
mappings. Within the one- or two-dimensional lattice of neurons the algorithm 
has to find the neuron that is closest to some network input of arbitrary di- 
mension. The reference vector of the neuron found (the winner) is moved by 
some distance in direction towards the input vector. The neurons neighboring 
the winner are moved by a value decreasing with growing neighborhood size. 

The Folk and Kartashov elastic network model 

Based on elastic interaction between neurons, the network model of Folk and 
Kartashov |2] moves the neuron reference vectors according to the ’’forces” acting 
on neighboring neurons. These forces are calculated by using the input signal 
density inside the Voronoi cells defined by the neurons’ weight vectors, which 
are also considered in terms of distances. The network can perform all weight 
adaptations required for a single learning step in parallel. 

Linsker’s maximum mutual information network 

The idea of maximizing the Shannon information rate (average mutual informa- 
tion) of an input-output mapping was used by Linsker for deriving a learning 
rule that performs gradient ascent in the information rate [S]. The algorithm 
explicitly addresses lateral connections and requires the calculation of distances 
from input vectors to reference vectors. While Linsker uses Eucledian exponen- 
tially weighted distances, distance measurement has to recognize that opposite 
side neuron grid borders are connected to each other and therefore form a torus. 



PSp = 






( 4 ) 
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4 Evolution strategies 

Evolution strategies are known to be well suited for solving difficult real valued 
optimization problems m- For our numerical experiments we have chosen a par- 
ticular instance of parallel evolutionary algorithms, the neighborhood evolution 
strategy (NES) [TT] , 

The neighborhood model of the NES places the individuals of the popula- 
tion on a grid being folded in such a way as to form a two-dimensional torus. 
Therefore, the grid defines the neighborhood relations among individuals. A par- 
ticular neighborhood encloses all individuals surrounding an individual within 
some fixed distance in the maximum norm. 

As local selection operator we used mating selection successfully. The two 
best individuals are selected from the neighborhood in order to generate the 
successor of the current individual in the next generation. Therefore, the com- 
munication among individuals within a certain neighborhood is purely local. 

Both, the NES and the neural network models considered here, share similar 
properties: They are inherently parallel and they are based on local interactions. 
This is a useful prerequisite for hybrid learning systems. 

5 Simulations 

For calculating the measures of section [2] in parallel to the neural learning pro- 
cess, we used random points drawn from the bivariate uniform distribution over 
the range [0, 1] and mapped them into a 8 x 8 two-dimensional neuron lattice 
(initially configured randomly) by using the algorithms described in section [3] 
The rules’ parameters were chosen such that the lattice unfolds properly, while 
the number of input presentations required depended on the learning algorithm 
used (note: Linsker’s rule uses batch learning). At each pattern presentation, we 
calculated on-line the measures of section E] for the last 64 input-output map- 
pings (initially less than 64) for approximating the mapping of 64 input points 
each addressing another neuron of the 8x8 neuron lattice. Because we consider 
Eucledian spaces, Eucledian distances were used for the \ 

We used the same measuring code for optimizing the point to lattice map- 
pings by the neighborhood evolution strategy. The NES controlled the real valued 
two-dimensional coordinates of 64 points in input space. The measuring func- 
tions kept the real valued coordinates of the 64 reference vectors constant while 
the neighborhood relation of these vectors was defined by the equidistant spaced 
discrete 8x8 two-dimensional grid in output space. Each of the reference vectors 
coordinates were set to | , i = 1 ... 8 for representing the topologically correct 
mapping of the discrete output space grid into input space. Therefore, the NES 
had to find under control of the measures considered the points in input space 
which map perfectly into the output space grid. (Due to this method the abso- 
lute values calculated by the measuring functions in the NES do not match the 
values of the neural network simulations.) As boundary restrictions imposed on 
the optimization the variable values had to stay within the interval [0, 1]. 
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For solving this 128 real valued parameter optimization problem we com- 
bined the NES with a line search algorithm (Hooke-Jeeves, ®) ran at different 
frequencies within the evolution strategy’s evaluation loop. 

Note that it is difficult to compare the neighborhood measures gained by the 
neural network models and the NES among each other since the NES operates 
on a fixed set of input points and therefore calculates the optimum of the various 
measures. Because the neural network models use changing input points, they 
calculate the mean of the topographic distortion among the inputs. The pres- 
ence of ’’wrap around” neighborhoods (Linsker’s network) and ’’border neurons” 
(elastic network) influences the measure values for the neural mappings. There- 
fore, the slope of the measure curves is most informative regarding the learning 
process. 

6 Discussion 

We have simulated the learning algorithms (realizing topographic mappings) of 
section [3] which differ regarding the principles underlying the learning rules. In 
parallel to the learning process, we have calculated the neighborhood preser- 
vation measures of section |2] for quantifying the process of learning. Figures |3] 
to [5] show that at least metric MDS and Sammon mappings can be considered 
as Lyapunov functions describing the learning dynamics of the three network 
models: These functions are clearly minimized. The results for metric MDS and 
Sammon values are confirmed by the topology preserving mappings gained by 
the NES (figures El to C]) using local mating selection (as described in P!) and 
a population size of 100 individuals. 

In the neural network simulations, the on-line calculated values for the C mea- 
sure and the Spearman coefficient psp do not show changes significant enough to 
characterize the learning process (figures El to E) . In addition, the NES was not 
able to generate a topology preserving mapping under control of the C measure 
and the Spearman function. This gives rise to the assumption that the signal 
for directing the process of learning topological mappings provided by these two 
functions is too weak and that the values they calculate do not provide a strong 
causal relation to the development of topological mappings. 

The mapping obtained for Sammon’s measure (fig. El) takes longer to learn 
(fig.EI than the mapping evolved for the mMDS case (fig. 0. Some preordering 
of the initial grid configuration could speed up learning. 

Furthermore, the line search optimization algorithm used for supporting the 
NES failed to converge to a valid solution at any function used. 

7 Conclusions 

Summarizing, metric MDS and Sammon mappings are suited for guiding the 
evolutionary learning process of an evolution strategy and for explaining the 
topology preserving learning process of a variety of neural network learning algo- 
rithms by means of a Lyapunov function. As combined result of the simulations. 
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Fig. 1. Fitness values for metric MDS 
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Fig. 2. Sammon’s measure: fitness 
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Fig. 3. Kohonen’s SOM 



Fig. 4. Elastic network 
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Fig. 5. Linsker’s network 



Figures [T] and The fitness values 
returned by the NES are plotted as 
bars for each generation. A missing 
bar indicates that the corresponding 
generation does not carry an im- 
proved individual conforming to the 
boundary restrictions. 

Figures E] to E] Each of the graphs 
show the values for metric mul- 
tidimensional scaling (top left), 
Sammon’s measure (top right), C 
measure (bottom left), and for the 
Spearman coefficient psp (bottom 
right). 
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Fig. 6. Topographic mapping evolved using metric MDS as objective function: initial 
random configuration, typical intermediate, and final mapping. Ordered intermediate 
configurations are reached much faster compared to the S-measure case. 
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Fig. 7. Topographic mapping evolved using Sammon’s measure: initial random config- 
uration, intermediate, and final mapping. Intermediate configurations are less stable 
than those for the mMDS case (fig.[6ll- 
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Fig. 8. Topographic mappings found by the Kohonen network (left) and by the neigh- 
borhood evolution strategy for the C measure (middle) and psp (right) case. 



we argue that metric MDS and Sammon mappings do characterize adequately 
the essential properties of topology preserving mappings. 

Considering the joint characteristics of the NES and the neural network 
models used, hybrid architectures employing topology preserving mappings are 
within reach. 
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Abstract. In this papeiQ we report results for the prediction of thermo- 
dynamic properties based on neural networks, evolutionary algorithms 
and a combination of them. We compare backpropagation trained net- 
works and evolution strategy trained networks with two physical models. 
Experimental data for the enthalpy of vaporization were taken from the 
literature in our investigation. The input information for both neural net- 
work and physical models consists of parameters describing the molecu- 
lar structure of the molecules and the temperature. The results show the 
good ability of the neural networks to correlate and to predict the ther- 
modynamic property. We also conclude that backpropagation training 
outperforms evolutionary training as well as simple hybrid training. 
Keywords: Neural Networks, Evolution Strategies, Hybrid-Learning, 
Ghemical Engineering 



1 Introduction 

In chemical engineering the simulation of chemical plants is an important task. 
Millions of chemical compounds are known yet and experimental data are often 
not available. For this reason there is a need for calculation methods which are 
able to predict thermodynamic properties. Usually models are developed, which 
have a physical background and where the model parameters have to be fit- 
ted with the aid of experimental data. This leads usually to nonlinear regression 
models with a multi-modal objective function where evolution strategies are suc- 
cessfully used. In contrast to models with physical background simple so-called 
incremental methods are widely used, too. Each functional group of a molecule 
gives a contribution to the thermodynamic property and the sum of all contri- 
butions have to be calculated. A new way for the calculation and prediction of 
thermodynamic properties is the use of neural networks. Descriptors, which can 
be derived from the molecular structure, have to be defined for the input layer. 
Then experimental data for a specific thermodynamic property can be used for 
training. Predictions of this thermodynamic property are then possible by using 
the molecular structure for a chemical compound, where no experimental data 

^ Acknowledgments: The work presented is a result of the Collaborative Research 
Center SFB 531 sponsored by the Deutsche Forschungsgemeinschaft (DEG) 
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are available. In this investigation the enthalpy of vaporization was taken. In 
section 2 we give a brief overview of the models used and continue in section 
3 with an experimental comparison of physical models, networks trained with 
backpropagation, networks trained with evolutionary algorithms and a combi- 
nation of the latter two. 

2 Models for the enthalpy of vaporization 

2.1 Physical models 

The physical background for the enthalpy of vaporization AHy consists of elec- 
trostatic interactions forced by the atoms of the molecules. Equations can be 
derived from statistical thermodynamics in order to describe the interactions 
between molecules (first level) and between functional groups of these molecules 
(second level). Physical models, such as UNIVAfU (UNIversal enthalpies of VA- 
Porization) which summarizes the interactions between functional groups of the 
molecules within a pure liquid were developed [5]. This model consists of sums 
of exponential terms, which include the interaction parameters and the tem- 
perature. The interactions are weighed by the surface fractions of functional 
groups of a molecule. The interaction parameters can be fitted to experimental 
enthalpy of vaporization data. This leads to a non-linear regression problem with 
a multi modal objective function. This function consists of the mean absolute 
error (MAE) over all experimental data points N between the calculated values 
(physical model) and the experimental data: MAE = 

N 

Due to the complex structure of the physical model, especially the exponen- 
tial terms, multi-modality occurs. An evolution strategy for solving this problem 
was developed mm- Another theoretical approach is the so-called EBGVAP 
model (Enthalpy Based Group contribution model for enthalpies of VAPoriza- 
tion) which was used in our investigation, too. 



2.2 Neural Networks 

Neural networks are able to acquire an internal model of a process by learning 
from examples. After successful training the network will be a model for the 
process which led to the experimental data. Theoretical results show that feed- 
forward networks are capable of arbitrary exact function approximation, given 
an unlimited number of free parameters or infinite precision [3]. 

In our experiments we used simple feed-forward networks with non-linear 
sigmoid activation functions. As training algorithms we employed the standard 
Backpropagation algorithm [B] and various (p.,A) evolution strategies [7|lj as well 
as a combination of both. 

^ For the UNIVAP model it was difhcult to reach the critical point, where the enthalpy 
of vaporization reaches null. A modified temperature dependence was used in this 
investigation to improve the performance at critical points (UNIVAP2). 



108 Martin Mandischer, Hannes Geyer, and Peter Ulbig 



3 Experiments and Results 

Comparing different methods or models is at least two-fold. On the one hand 
a fair comparison should allow all models the same number of free parameters 
to adjust to the problem. On the other hand, one can say that it is sufficient 
if a model performs good on formerly unseen data regardless of the number of 
parameters it needed. 

Some of our experiments were designed to find good neural models under 
the most similar conditions for the calculations as the physical models. Here 
the number of adjustable parameters was almost the same for all models. In 
other experiments we searched for good results independent of the number of 
free parameters (weights) used. One difficulty is to find the optimal structure 
of the neural network and the optimal structure of the temperature dependent 
equation of the physical model. Here we only investigated the structure of the 
network. Another important issue is to have the same input information for all 
methods, which can be derived from the structure of the molecules. 



3.1 Generation and Description of the Data 

Selection of data. The experimental data concerning the enthalpy of vaporiza- 
tion were taken from data handbooks. Data for three different classes of chemical 
compounds were used: normal alkanes, 1-alcohols, and branched alcohols. These 
data were chosen for the investigation of three different functional groups, the 
so-called main groups: CH3, CH2 and CH„OH. The group CHnOH contains the 
functional groups CH3OH, CH2OH and CHOH. The experimental data cover 
a temperature range from 92 K to 776 K. The number of carbon atoms in the 
n-alkanes goes from 2 (Ethane) to 19 (Nonadecane), for the 1-alcohols from 1 
(Methanol) to 14 (Tetradecanol) and for the branched alcohols from 4 (2-Methyl- 
2-propanol) to 6 (2-Methyl-2-pentanol) . 

Selection of descriptors. There are several possibilities for the definition of 
descriptors as input variables for a neural network: number of atoms, number of 
single bonds, molar mass, dipole moment and topological parameters concerning 
the connectivity between atoms |2]. In our investigation the descriptors for the 
input layer are the surface fractions of the functional groups within a molecule 
and the temperature. Therefore a definition of functional groups is necessary. 
Here the definition of the UNIVAP model [8] shall be used. 

Partitioning into subsets for cross validation. After generating the data 
set it was subdivided into 3 classes: training (50%), validation (25%) and test 
(25%) set. The training set was used to adapt the parameters for all our models. 
The validation set could be used during the adaption process to evaluate the 
algorithms performance on unknown data and stop the adaption process if the 
error on the validation set increases. Validation and test set therefore measure 
the generalization ability of our models. However, 50% of the data were used only 
for comparison, i. e. for a test of the prediction of the enthalpy of vaporization. 
The distribution of the data can be seen in Table [TJ 
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Gronp interaction 


parameters 


data points (total) 


data points (training) 


CH3CH3 


3 


110 


53 (48.18 %) 


CH3CH2 / CH2CH2 


9 


138 


75 (54.35 %) 


CHnOHCHnOH 


3 


19 


10 (52.36 %) 


CH3CHnOH / CH2CHnOH 


12 


162 


76 (46.91 %) 


total: 


27 


429 


214 (49.88 %) 



Table 1. Number of experimental data for the different group interactions 



Transformation. For the use with the neural network the data were normalized 
via separate linear transformations of main-groups, temperature and enthalpy to 
the interval [0.1 .. 0.9]. Network responses outside of this interval were mapped 
onto the boundaries and then re-transformed to the original scale. 



3.2 Physical Model Experiments 

The training set was used for the regression of the interaction parameters and the 
training of the neural network. First the parameters were computed successive, 
i. e. first the 3 parameters for the interaction CH 3 /CH 3 were calculated and 
with these parameters the next 9 parameters (corresponding to Table [T]) were 
calculated and so on. These sequential experiments for the physical models were 
done with the aid of an encapsulated evolution strategy without a correlated 
step-length control [S]: [GG 3 -I- 8{GG 7 + 19)^°°]^° 

Three different runs with each 1.2-10® function calls of the evolution strategies 
gave similar results. These results were optimized by the simplex-method with 
30 different runs of 1000 iterations each. The best result for UNIVAP2 (seq) 
and EBGVAP (seq) can be found in Table E In contrast to this sequential 
regression of the model parameters a simultaneous regression (sim) of all 27 
parameters was investigated by using a similar encapsulated ES as for the 
sequential experiments. The results of these runs were improved by a simplex- 
method, too and can be seen in Table E 





UNIVAP 2 (seq) 


EBGVAP (seq) 


UNIVAP 2 (sim) 


EBGVAP (sim) 


Train 


0.914 


0.635 


1.128 


3.784 


Valid 


1.190 


0.867 


1.353 


4.436 


Test 


0.948 


0.613 


1.209 


3.863 


All 


1.017 


0.705 


1.230 


4.028 




NN-A 


NN-B 


NN-ES (best) 


NN-ES (avrg) 


Train 


0.652 


0.570 


0.612 


1.143 


Valid 


0.566 


0.878 


0.876 


1.536 


Test 


0.686 


0.703 


0.747 


1.357 


All 


0.635 


0.717 


0.745 


1.345 



Table 2. Mean absolut error per pattern for different data sets and models 
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Fig. 1. Training error {rj^O.8) Fig. 2. Validation error {rj=0.8) 

3.3 Neural Networks Experiments (Backpropagation) 

The learning rate rj and the architecture of the network (number of hidden units 
and connections) have the biggest influence on the performance of the network 
1^. To And good neural network solutions we did a primitive parameter study. We 
first varied the learning rate with a fixed architecture which had approximately 
the same number of free parameters (connections) as the UNIVAP methods. 
With the best learning rate found, we searched for a good number of hidden 
units. All runs were performed 10 times. 

Variation of the learning rate. We fixed the architecture of the network at 
4 input, 4 hidden and 1 output units (4-4-1) to have approximately the same 
number of free parameters (25) as the UNIVAP method. We started with a very 
low learning rate rj = 0.001 and ended with a far too high rate rj = 10.0. The 
momentum term a was fixed to 0.2. A training run was stopped after it reached 
the error limit or exceeded a maximum number of 100,000 pattern presentations 
(epochs). The error is defined as: tss = ^ x ~ With 9 as target 

vector and o as output activation of the network. 

Figure[T]and|2]show the curves for 10 different runs with the best learning rate 
which was used throughout all other experiments. The left-hand side figure gives 
the error on the training set and on the right-hand side we see the validation 
error. If an error curve reaches the base of the graph it satisfied a specified 
error limit {tss < 5 x 10“®) for the whole training set. Networks with very low 
learning rate never reached the specified error limit, due to the very slow learning 
progress. A too high rate, resulted in oscillating error curves. 

Variation of the number of hidden units. After variation of rj we used the 
best rate, as constant for the hidden unit search^. The number of hidden units 
were varied between 1 and 40. Networks with less then 3 units failed to learn 
the task. Up to 40 units the results on training as well as validation data were 

® This does not mean that both parameter are independent of each other. We consider 
this value to be a first estimate to start with. 
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almost independent of the number units employed. We therefore used our initial 
4-4-1 network. This is an additional advantage because it can now be directly 
compared to other methods which use the same number of free parameters. 



3.4 Neural Networks Experiments (Evolution Strategy) 

In this experiment we substituted the Backpropagation algorithm with an evo- 
lution strategy. Some authors |10| reported good results when training a net- 
work with an ES. Again we systematically searched for a good parametrisation 
of the (15,100)-ES. Parameters under consideration were the number of muta- 
tion step-sizes and the recombination scheme used on the object variables Xi 
(the network weights). Each parameter setting was run for 10,000 generations 
(1,000,000 pattern presentations) and repeated 10 times to have some statisti- 
cal validity. All of the following variations of the bisexual recombination scheme 
were done with 1 and 25 u: no recombination of Xi and tJi, discrete recombina- 
tion of Xi and discrete of tJi discrete recombination of Xi and intermediate of Ui 
intermediate recombination of Xi and discrete of Ci intermediate recombination 
of Xi and intermediate of For details on ES see m- 

None of the parameter settings lead to good and reliable results. Only one out 
of all ES trained network performed comparable to Backpropagation. All other 
networks give rather poor results. The quality of the average result did improve 
when using backpropagation as local search procedure (an additional training of 
250,000 epochs) after ES optimization but was not as good as Backpropagation 
alone. Figure [31 shows the best run, which we regard as a very rare event, with 
a (15,100)-ES. 




Fig. 3. (15,100)-ES (error during training) 
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3.5 Comparison 

For a comparison, we took two network architectures with learning rates gained 
by the previous experiments. Architecture A has 4 hidden units and the nearly 
the same number of free parameters (25 weights) as the UNIVAP models (27). 
Architecture B performs alike and has 6 hidden units (37 weights). Table Elgives 
on overview of all experiments. 

1. Parameters for NN-A (4-4-1): 77=0.8, epochs=250,000 

2. Parameters for NN-B (4-6-1): 77=0.8, epochs=250,000 

3. Parameters for NN-ES (4-4-1): best (15,100)-ES #(t = n, intermediate re- 
combination of Xi and <7^ (100,000 generations) 

4. Parameters for NN-ES (4-4-1): average (15,100)-ES = n, intermediate 

recombination of Xi and tJi (100,000 generations) 

As an additional test for generalisation ability, we used all data of an ethane 
molecule. In figures[4|and[5]we compare all models on the enthalpy prediction for 
ethane. It can be seen that the physical model and the neural network performs 
equally well on this task, except for the critical regions near T ^ and 
AH y (Ter) = 0 J/mol, where the network outperforms all other models. Almost 
all networks trained with an ES and the UNIVAP model give only a poor linear 
approximation of the enthalpy curve. 

20 



15 

I 

10 

_Q. 

c 

LU 

5 



0 

100 150 200 250 300 

Fig. 4. Performance on ethane (good) Fig. 5. Performance on ethane (poor) 

4 Discussion 

The most important result of this investigation is the good ability to correlate as 
well as to predict the enthalpy of vaporization with neural and physical methods. 
Neural networks with simple Backpropagation training are as good as physical 
models and especially at critical temperatures even slightly better, but their 
computational effort is much lower. The comparison of the results for UNIVAP2 
and EBGVAP show the influence of the structure of the model itself. Further 
investigations could use evolutionary algorithm to optimize the structure of the 
models with regard to the temperature dependence. For the neural networks it 
can be stated that the use of surface fractions of functional groups as descriptors 
for a neural network leads to good results for both correlation and prediction. The 
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big advantage of this new procedure is, that the molecules can easily be divided 
into functional groups, which makes it easy to use in engineering applications 
and allows the direct comparison of neural networks and physical models, due 
to the same input information. The investigations concerning the architecture 
of the neural networks show, that a simple network structure is sufficient and a 
more complicated network does not give better results. In this context evolution 
strategies as training algorithm and combinations of ES with backpropagation 
failed to deliver useful models in almost all experiments. 

From a thermodynamic point of view, it is interesting that a simple method 
like a neural network can give similar results in comparison with much more 
complicated physical motivated models. If a physical model gives results with a 
quality less than a neural model, the physical model should be improved. How- 
ever, in chemical engineering there are many thermophysical properties, which 
are usually not described by physical methods, but by incremental methods. 
These methods, for example, for critical data, normal boiling points and so on, 
could be replaced by neural networks. However, these results are first steps in 
developping efficient network structures for our purpose and especially investiga- 
tions with more functional groups will give a better comparison between physical 
models and neural networks. 
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Abstract. Cellular Automata architectures are attractive due to their 
hne grain parallelism, simple computational structures and local rout- 
ing resources. Some researchers have used genetic algorithms to hnd CA 
that perform useful computations. The inherently parallel cellular au- 
tomata model as well as the genetic algorithm are poorly suited to imple- 
mentation on general purpose microprocessor based systems. Field Pro- 
grammable Gate Arrays are an alternative that can provide significant 
speedup. This paper describes the Xilinx XC6216 Field Programmable 
Gate Array and how it is used to efficiently search a hybrid 2-state, 5- 
neighbour cellular automata rule space that exhibits computation univer- 
sality. Its application to an image processing application, binary texture 
analysis, is discussed. 

Keywords: FPGA, Cellular Automata, Genetic Algorithm 



1 Introduction 

Cellular Automata (CA) have been considered as a model for general purpose 
computation by several authors. Several works describe CA capable of universal 
computation (computational power equivalent to a universal Turing machine [1]). 
Field Programmable Gate Arrays (FPGA) can provide a programmable, max- 
imally parallel implementation of CA but can only efficiently implement small 
CA. Large CA require multiple FPGA and/or time multiplexing where array 
initialisation and result reading soon dominate the computation time. Searching 
CA rule spaces is one application which typically uses small test examples and 
therefore small CA. 

Papers by Richards et. al[2], Mitchell[3] and Sahota[4] describe the use of 
genetic algorithms to search CA rule spaces where fitness is a function of how 
well the CA behaviour matches a desired algorithm output. We suggest that 
CA individuals be implemented in a FPGA so that the time consuming fitness 
evaluation task can be reduced. By using rapid reconfigurability individuals can 
then be swapped in and out of hardware as required by the genetic algorithm [5] . 

Cellular Automata have long been considered as an ideal architecture for 
spatially distributed/inherently parallel image processing applications [6]. We 
are particularly interested in the application of small CA for block based image 
processing applications such as local feature identification. In this paper we 

X. Yao et al. (Eds.): SEAL’98, LNCS 1585, pp. 114-[T2T] 1999. 
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propose the use of XC6200 series FPGAs to accelerate exploration of CA rule 
spaces to find solutions to such problems. 

Previous work in this area has applied genetic algorithms to fixed size CA 
rule tables. We consider a unique XC6200 hybrid CA model and search a hard- 
ware dictated rule space based on the FPCA structure. This rule space has the 
advantage of computation universality as well as ease of implementation. 

Section 2 will introduce Field Programmable Cate Arrays and the Xilinx 
XC6216 device. Section 3 describes the XC6216 hybrid CA model and discusses 
the allocation of hardware resources and universal computation. Application 
of genetic algorithms and issues of representation are discussed in Section 4. 
Section 5 presents the CA Evolver experimental apparatus and its application 
to a block based image processing application: binary texture analysis. Results 
and observations will be presented in Section 6 followed by a summary and 
discussion of future work in Section 7. 



2 Field Programmable Gate Arrays and the XC6216 

Field Programmable Cate Arrays are now a popular implementation style for 
digital logic systems and sub-systems. These devices consist of an array of un- 
committed logic gates whose function and interconnection is determined by 
downloading configuration information to the device. When the programming 
configuration is held in static RAM, the logic function implemented by those 
FPGAs can be dynamically reconfigured in fractions of a second by rewriting 
the contents of the SRAM configuration memory. 

The XC6216 FPGA by Xilinx, depicted in Figure 1(a), has a regular 64 by 
64 two-dimensional array of function blocks. These function blocks, depicted 
in Figure 1(b), contain a function unit capable of implementing any 2 input 
logic gate or 3 input multiplexer as well as a d-type flip-flop. Each block also 
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Fig. 1. (a) The XC6216 FPGA and (b) The XG6216 Cell 
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contains local routing resources to adjacent neighbours {Nout, Souti Eout,Wout 
multiplexers) . 

Function blocks are grouped hierarchically for routing purposes. At the lowest 
level blocks are interconnected to the nearest neighbours in four directions. These 
blocks are grouped into four and supplemented with length 4 fastlanes. These 
4x4 groups then form part of larger 16x16 groups which are further supplemented 
by length 16 wires and chip-length interconnects. The configuration memory of 
Xilinx XC6216 FPGA is mapped directly to the host processor enabling partial 
reconfigurability where part of an FPGA device can be programmed at high 
speed without affecting the rest of the design. Readers are referred to [7] for a 
more detailed description of the XG6216 device. 

3 The XC6216 hybrid Cellular Automata 

In their simplest form Gellular Automata can be considered a homogeneous ar- 
ray of cells in one, two, three or more dimensions. Each cell has a finite discrete 
state. Gells communicate with a number of local neighbours and update syn- 
chronously according to deterministic rules [8] . One of the simplest GA models 
has 5 neighbours and 2 states. The state of each binary 0/1 cell depends on the 
state of the cell at the previous time step, plus the state of the four north, south, 
east and west neighbours at the previous time step. 

To implement this GA model, we need to be able to implement any logic 
function of 5 variables. This can be achieved with a look up table, or alternatively 
with the logic tree of Figure 2(a). This logic tree is implemented in a 4 by 4 group 
of XG6216 function blocks illustrated in Figure 2(b). The function block outputs, 
which implement the logic tree structure are illustrated by solid arrows. The GA 
cell state is stored in function block 5. This function block output is routed back 
throughout the 4 by 4 group and to all north, south, east and west faces of the 
cell. Its path through the local routing resources is illustrated by dashed arrows. 

State information from a GA cell’s north, south, east and west neighbours 
is routed across all function blocks through the XG6216 length 4 fastlanes. This 
means that every function block has access to all 5 boolean state variables. The 

4 by 4 group of Figure 2(b) is replicated across the XG6216 to form a 15 by 15 
two dimensional Gellular Automata. The North/South and East/ West ends of 
the array are wrapped around via the XG6216 chip length routing resources. 

Godd in [8] proved that a 2-state 5 neighbour GA capable of universal com- 
putation did not exist with a finite initial configuration. Several authors achieved 
computation-universal GA models by adding more states or larger neighbour- 
hoods [9]. We enhance the computational power of Figure 2(a) by including the 
additional memory resources (flip-flops) that are freely available in each XG6216 
function block. This allows us to consider a GA model based on neighbour- 
hood information from further back in time. With this in mind, unbounded but 
boundable propagation is easily demonstrated. We have also demonstrated the 
computation universality of this hybrid GA model by simulating Minsky’s two 
register machine [9]. 
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Fig. 2. (a) Flexible Logic Tree (b) Function Block Assignment 



4 Searching the XC6216 hybrid CA rule space 

To search the XC6216 hybrid CA rule space with a genetic algorithm several 
choices must be made concerning representation. In similar experiments, based 
in software, individuals represent fixed size rule tables. Ease of implementation 
leads us to consider a hardware dictated rule space with individuals based on 
the XC6216 configuration bit string. We consider two different hardware repre- 
sentations. 

FPGA Bitstring: The first is based on the raw configuration bitstring of the 4 
by 4 group of XC6216 function blocks. Connectivity between each group is pre- 
defined (to implement the 15 by 15 two dimensional CA array) but all routing 
and functionality within the group varies according to the genetic algorithm. 
Logic Tree: The second representation is based on the flexible logic tree of 

Figure 2(a). In this case the routing resources both between and within each 4 
by 4 group of function blocks are pre-defined and we evolve only function. A 
performance comparison of the FPGA Bitstring and Logic Tree hardware repre- 
sentations, as well as a software based CA rule table representation is presented 
in Section 6. 

The function unit in each XC6216 function block is illustrated in Figure 3(a) 
and defined by two eight bit configuration bytes of Figure 3(b). CS defines 
whether or not the function unit will make use of the flip-flop resource. The 
XI, X2 and X3 configuration bits define the input signals to the logic gate or 
multiplexer. Y2 and Y3 define the gate or multiplexer type. RP sets the flip-flop 
as read-only and M defines additional routing resources. 

When using the Logic Tree representation a CA individual is completely 
specified by defining these two bytes for each function block within the 4 by 
4 group. The effective genetic string length for the Logic Tree CA is therefore 
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Fig. 3. (a) XC6216 Function Unit and (b) Configuration Bytes 



32 bytes or 256 bits. The FPGA Bitstring GA requires an additional byte per 
function block. Configuration byte 3, illustrated in Figure 3(b), defines the local 
Nout, Sout^ Eout,Wout multiplexers of Figure 1(b). The effective genetic string 
length for the FPGA Bitstring GA is therefore 48 bytes or 384 bits. 



5 The CA Evolver Experimental apparatus 

Our experiments are based on the Hot Works development system. This system 
incorporates a Xilinx XC6216 FPGA that communicates to a host processor 
through a 32 bit PCI bus. The main genetic algorithm loop and genetic operators 
are implemented on the host processor. A population of CA individuals are held 
in memory and downloaded into the XC6216 as required to perform the fitness 
evaluation task. 

The reproduction scheme is similar to that used in [3]. The fittest 100 CA 
individuals from a population of 300 are copied directly to the next generation. 
The remaining 200 are generated by application of the cross-over and mutation 
operators on parents chosen with replacement from the elite 100. 

For the FPGA Bitstring GA a one point cross-over is applied to the 384 bit 
configuration bit string. For the Logic Tree representation a specialised cross- 
over that constrains the structure of CA offspring to the logic tree of Figure 2(a) 
is used. A cross-over branch is randomly selected and then a sub-tree formed 
by all child nodes and branches. This sub-tree is then exchanged between two 
parents similar to genetic programming [5] . In both cases offspring are mutated 
in exactly four randomly chosen positions. 

The XC6216 CA Evolver system is applied to a binary texture analysis prob- 
lem which involves identifying a particular pattern within a 15 by 15 pixel area. 
The binary pattern chosen in this experiment is diagonal lines. Two screenshots 
of the CA Evolver system, each with training image on the left and ideal image 
on the right, are depicted in Figure 4. Each training image is segmented ei- 
ther horizontally as in Figure 4(a) or vertically 4(b). The non-patterned segment 
is filled with noise of a density randomly selected from a uniform distribution 
between 0 and 1. 
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Fig. 4. (a) Horizontal and (b) Vertically segmented: input image left/ideal image right 

The state of each CA cell is first initialised with the corresponding point in 
the 15 by 15 training image. The XC6216 is then clocked for 200 cycles at 33Mhz 
to iterate the CA array. The fitness of the CA individual is calculated by a bit 
by bit comparison of the state of CA cells and the ideal image. A fitness counter 
is incremented for each CA cell in correspondence with the ideal image. While 
the counter is greater than zero it was also decremented for each cell that is 
not in correspondence. As in [4] CA individuals that lead to collapsed images 
(the state of the CA array ends up all I’s or all O’s) are penalised and receive a 
fitness score of 0. Each individual within the population is evaluated against 300 
training images at each generation. As in [4] the overall fitness for an individual 
is calculated as the root mean square over the 300 images. 

6 Performance Comparison 

The computation times for the XC6216 based genetic algorithms were estimated 
from the CA Evolver experiments. A software based experiment that searched 
the XC6216 hybrid CA rule table was also implemented and is compared in 
Table 1. The computation time for the genetic algorithm can be calculated as: 

Ttotai = Tpop + Generations *(T t + Tga) where Tpop is the time to gener- 
ate a population, T t is the time needed to evaluate the fitness of a population 
and Tga is the time to generate a new population with the CA operators. 

Measurements were based on non-optimised code and it is expected that exe- 
cution time could be significantly reduced. The relative measurement of speedup 
can be considered accurate and is summarised in Table 1. 



Measurement 
in Seconds 


Software 
Rule table 


FPGA 

Bitstring 


Logic 

Tree 


Relative 

Speedup 


Tpop 


0.1 


0.1 


0.1 


1 


Tfit per generation 


1987.9 


34.22 


34.17 


58 


Tga per generation 


0.05 


0.06 


0.08 


1 



Table 1. Estimation of Speedup 
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Quality Comparison 



FPGA LOGIC SOFT 




Fig. 5. Average pixel error in experiments 



We also make a qualitative comparison of the two FPGA based GA, FPGA 
Bitstring and Logic Tree representations as well as the software based rule table 
GA. For each experiment run the GA had 100 generations in which to find a GA 
to correctly classify the diagonal line pattern. Figure 5 illustrates the average 
pixel error as a function of GA time (generations). The Logic Tree representation 
outperformed the FPGA Bitstring representation in all experiments. We con- 
clude that constraining connectivity and evolving only function avoids strange 
feedback loop/analog circuit behaviours and allows a more detailed exploration 
of a smaller, more relevant rule space. 

In several runs individuals of note appeared that performed well for images 
segmented in one direction (eg. vertically) but poorly in the other direction. 
To encourage the GA to find solutions in both directions the Logic Tree search 
space was further constrained. The multiplexer selector of node 4 in Figure 2(a) 
was pre-defined as the cell’s current state in order to reduce the size of the 
non-symmetric rule space. Due to the close spatial relationship between train- 
ing and ideal images, two nodes of type 1 (one in each half of the logic tree) 
were also pre-defined to store the initial training image data. The best of run 
GA from these experiments had an average pixel error of 40 pixels, lower than 
any other GA found. We conclude the Logic Tree GA can implement problem 
specific constraints more easily than raw FPGA bitstring and rule table based 
approaches. 
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7 Summary and Future Directions 

The CA Evolver is a powerful tool with which to investigate genetic algorithm 
’programming’ techniques of cellular automata architectures. When searching 
XC6216 hybrid CA rule spaces speedup in the order of 10-100 times can be 
achieved compared to software implementations. Our initial experiments ap- 
plied this tool to a binary texture analysis problem and investigated the role 
of representation. The FPGA based search space allowed problem specific con- 
straints to be implemented easily leading to improved performance of the genetic 
algorithm. 

Questions yet to be addressed include the role of non local CA neighbour- 
hoods, extension to practical application and issues of scale. Eventually our work 
hopes to redefine these questions in terms of how best to ’program’ XC6216 ar- 
chitectures as massively parallel machines and investigate a range of cellular 
automata models including random boolean networks. 



Acknowledgement 

This work was carried out in the Cooperative Research Centre for Satellite Sys- 
tems with financial support from the Commonwealth of Australia through the 
CRC program. 



References 

1. J. E. Hopcroft and J. D. Ullman, ’’Introduction to Automata Theory, Languages 
and Computation”, Addison- Wesley, 1979, Reading, Massachusetts. 

2. F. C. Richards, T. P. Meyer, N. H. Packard, ’’Extracting Cellular Automaton Rules 
Directly from Experimental Data”, Cellular Automata: Theory and Experiment, H. 
Cutowitz, ed., 1st MIT Press ed., 1991, Massachusetts. 

3. M. Mitchell, J. P. Crutchheld and R. Das, ’’Evolving Cellular Automata with Ge- 
netic Algorithms: A Review of Recent Work”, First International Conference on 
Evolutionary Computation and Its Applications (EvCA’96), Moscow, Russia. 

4. P. Sahota, M. F. Daemi, D. G. Elliman, ’’Training Genetically Evolving Cellular 
Automata for Image Processing” , International Symposium on Speech, Image Pro- 
cessing and Neural Networks, 13-16 April, 1994, Hong Kong. 

5. J. R. Koza, F. H. Bennett III, J. L. Hutchings, S. L. Bade, M. A. Keane, D. Andre 
’’Rapid Reconfigurable Field-Programmable Gate Arrays for Accelerating Fitness 
Evaluation in Genetic Programming”, Late Breaking Papers at the Genetic Pro- 
gramming 1997 Conference, J. R. Koza, ed., pp. 121-131, 1997, Stanford. 

6. A. Rosenfeld, ’’Parallel Image Processing Using Cellular Arrays”, Computer, No. 
16, pp. 14-20, 1983. 

7. Xilinx Inc. ’’XC6200 Field Programmable Gate Arrays” Product Description, Ver- 
sion 1.10, April 24, 1997. 

8. E.F. Codd, ’’Cellula Automata”, Academic Press, 1968, New York. 

9. E. R. Banks, ’’Universality in Cellular Automata”, IEEE 11th Annual Symposium 
on Switching and Automata Theory, pp. 194-215, 1970, Santa Monica, California. 




Asynchronous Island Parallel GA 
Using Multiform Subpopulations 



Hirosuke Horii, Susumu Kunifuji, and Teruo Matsuzawa 
{holly, kuni, matuzawa}@jaist . ac . jp 

Japan Advanced Institute of Science and Technology 
Tatsunokuchi, Ishikawa, 923-1292, JAPAN 



Abstract. Island Parallel GA divides a population into subpopulations 
and assigns them to processing elements on a parallel computer. Then 
each subpopulation searches the optimal solution independently, and ex- 
changes individuals periodically. This exchange operation is called mi- 
gration. In this research, we propose a new algorithm that migrants 
are exchanged asynchronously among multiform subpopulations which 
have different search conditions. The effect of our algorithm on combina- 
tional optimization problems was verified by applying the algorithm to 
Knapsack Problem and Royal Road Functions using parallel computer 
CRAY-T3E. We obtained the results that our algorithm maintained the 
population’s diversity effectively and searches building blocks efficiently. 



1 Introduction 

There are two typical problems in Genetic Algorithms (GAs). First, GAs require 
huge calculation time for their genetic operations, such as selection, crossover, 
mutation, and individuals’ fitness evaluations. Secondly, maintenance of the pop- 
ulation’s diversity is necessary to avoid the premature convergence which spreads 
local optimum solution and stagnates the evolution. 

Island Parallel GA divides the population into subpopulations and assigns 
them to processing elements on a parallel computer to improve the prosessing 
speed. Then it performs migration, in other words, it exchanges individuals, so 
that it can maintain subpopulations’ diversity. 

To implement this migration, there are two possibilities, synchronously ex- 
change individuals, synchronous migration model, and asynchronously exchange 
individuals, asynchronous migration model. Tanese’s model called Distributed 
GA PP is the typical synchronous migration model. It synchronously exchanges 
individuals among neighboring subpopulations to every fixed generation which 
called migration interval. If individuals are introduced before the search con- 
verged, it is difficult to generate superior schemata because good schemata will 
be destructed. Thus it is effective to introduce individuals after the search con- 
verged. However the progress of the search situation differs both the objective 
problems and every subpopulation, and it makes difficult to set optimal migra- 
tion interval. 
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Munetomo’s model is the typical asynchronous migration model, where 
migration is performed asynchrounously when its genetic construction’s diver- 
sity is lost. The loss of the diversity is judged by the ralative reduction rate of 
the fitness distribution’s standard deviation. Migrants are introduced from other 
subpopulation which has the most different genetic construction. The differences 
among subpopulations’ genetic constructions are measured by the differences 
among their fitness distributions’ average values and standard deviations. How- 
ever it is difficult to grasp the search situation correctly in the problem where 
the similarities of a gene and a fitness do not correspond such as combinational 
optimization problems. 

Considering the problems in both models, we propose a new algorithm which 
performs migration asynchronously and a new migration scheme which is more 
suitable for combinational optimization problems. 

2 Asynchronous Island Parallel GA 

This section describes the feature and the aim of our algorithm. We propose a 
new migration scheme and a cooperative search by the subpopulations which 
have different search conditions. 

2.1 Migration 

In order to maintain the diversity of the genetic construction, migrants must 
be introduced from the subpopulation which has the most different genetic con- 
struction. Then we face the problem. We need scales to grasp the search situation 
and to measure the genetic construction of each subpopulation. As a solution 
for this problem we propose to use Bias and Temporal Schema j4]. 

Bias and Temporal Schema are defined as follows. The population is expressed 
by a matrix P(t), using the bit string of each individual as a row vector, where 
pT means i-th individual’s j-th locus at t generation. Diversity of the population, 
which has population size N and gene length L, is measured as Bias B* (0.5 < 
B* < 1.0). Temporal Schema TS* is a binary string of length L and shows on 
which each locus has converged to 1 or 0 at t generation. Kts is a parameter of 
the schema detection threshold which is set up in the range (0.5 < Kts < l-O)- 
In each locus, when the individuals of the rate more than Kts has 1 or 0, it is 
considered that the locus converged. 



Pit) = ipl,) 



( 1 ) 




( 2 ) 
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TS* = 

ts\=U:Y.U^^-ph)>N^KTS 

I * : otherwise 

Each subpopulation judges whether the diversity of the genetic construction 
is lost or not by the comparison between Bias B* and the fixed threshold Kb- 
When Bias B* exceeds the fixed threshold Kb, migration is performed. Hamming 
distances among the Temporal Schema of the subpopulation which performs 
migration and the other subpopulations’ are made into the measure which judges 
the degree of difference of the genetic construction. By introducing migrants, it 
not only recovers the diversity of the genetic construction, but also it can obtain 
superior schemata generated by the other subpopulations. 



2.2 Cooprative Search Using Multiform Subpopulations 

In Island Parallel GA, since a population is divided into small subpopulations, 
the random genetic drift affects each subpopulation’s evolution strongly. So, even 
if the parameters of each subpopulation’s genetic operations are set up equally, 
each subpopulation’s genetic construction becomes various. Thus Island Parallel 
GA is cooperative search using multiform subpopulations tacitly. 

We propose making multiform subpopulations clearly by setting the param- 
eters for the genetic operations different values, aiming to improve the adapt- 
ability of the algorithm to object problems. 

Ishikawa’s model [5] performs the cooperative search using four subpopula- 
tions whose the generation gaps are set to 1.0, 0.7, 0.4 and 0.1. In his algorithm, 
the cooperative search is performed among the subpopulations which search the 
circumferences of high fitness individuals intensively by setting low generation 
gap, and supply new individuals by setting high generation gap. 



3 Applying APGA to Knapsack Problem 

We compare Asynchronous Island Parallel GA (APGA) with Synchronous Island 
Parallel GA (SPGA) by applying them to Knapsack Problem, which is the typical 
combinational optimization problem. 

Knapsack Problem is defined as follows. Knapsack Problem stuffs the loads, 
which have weight and value as their parameters, into a knapsack, and searches 
for the optimum combination which maximizes the total value within the weight 
limits. In N loads, weight and value of the j-th load are set to Wj and Vj, and 
the weight limits of the knapsack is W. It is referred to as Xj = 1 when stuffing 
j-th load, and as Xj = 0 when not stuffing. Knapsack Problem is formulized as 
follows. 
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'^ 3^3 

subject to ^ 

Xj e {0,1}, j = 



( 4 ) 



3.1 Experiment 

The population of 512 individuals is divided into 16 and 32 subpopulations. 
APGA’s migration is performed asynchronously, when Bias exceeds the thresh- 
old of Bias Kb for 20 generations. On the otherhand, SPGA’s migration is per- 
formed synchronously every 20 generations. Both APGA and SPGA are applied 
to Knapsack Problem which has 300 loads. Every trial is performed for 1000 
generations. The results are evaluated by the average of 10 times trials. 

3.2 Results and Discussion 

Figjl] shows the comparison between APGA and SPGA. FigH] indicates that 
APGA’s fitness is lower than SPGA’s in the early generations. However, APGA’s 
fitness overtakes SPGA’s in the later generations. 

Figj^ and Fig[^ show the changes of Bias in one of the subpopulations of 
APGA and SPGA in applying to Knapsack Problem. FigE] and FigEI indicate 
that APGA’s genetic construction is maintained more diverse than SPGA’s. 

In APGA, although the search efficiency falls, the genetic construction’s di- 
versity recovers successfully, because migrants are introduced from the subpop- 
ulation which has the most different genetic construction. On the other hand, 
in SPGA, local optimum solutions spread in the whole population gradually, all 
subpopulations’ genetic constructions become uniform, and the effect of migra- 
tion is lost. 

4 Applying APGA to Royal Road Functions 

In this section, we apply APGA to Royal Road Functions, R\ and i? 4 , which are 
evaluation functions proposed by Mitchell !SIS|. Royal Road Functions are given 
the feature of building block hypothesis specifically. 

Royal Road Functions are defined as follows. R\ uses one 8-bit binary string 
as one building block, and one individual consists of 8 building blocks, that 
is one individual consists of one 64-bit binary string. Fitness value is obtained 
when 8 loci, which constitute one building block, are set to ‘1’. Optimum value 
is obtained when all of the 64 loci, which constitute one individual, are set to 
T’. 

i ?4 is an extended function of R\ which uses 128-bit binary string as one 
individual. 8-bit binary string constitutes the lowest order building block, and 
two adjoining building blocks constitute a high order building block, that is one 
second order building block consists of 16-bit binary string and one third order 
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Fig. 2. The change of Bias in one of the subpopulations of APGA in applying to 
Knapsack Problem 
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Fig. 3. The change of Bias in one of the subpopulations of SPGA in applying to 
Knapsack Problem 
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building block consists of 32-bit binary string. Fitness value is obtained when 
higher order building block is constructed. Optimum value is obtained when all 
of the 128 loci are set to ‘1’. 

4.1 Experiments 

We have two experiments, the evaluation of our migration scheme and the eval- 
uation of cooperative search using multiform subpopulations. 

Experiment 1: Migration in APGA We evaluate our migration scheme by 
applying APGA to R\ and R^. The population of 1024 individuals is divided into 
4, 8, 16 and 32 subpopulations. Every R\ trial is performed for 200 generations, 
and every i ?4 trial is performed for 2000 generations. The results are evaluated 
by the average of 30 times trials. 



Experiment 2: Cooperative Search in APGA Using Multiform Sub- 
populations We compare the single parameter model with the multiform pa- 
rameters model by applying them to R^. The population of 1024 individuals is 
divided into 32 subpopulations. Every i ?4 trial is performed for 2000 generations. 
The results are evaluated by the average of 30 times trials. 

In the single parameter model, we set the pair, (generation gap, mutation 
rate, threshold of Bias Kb), at (0.8, 0.01, 0.9). 

In the multiform parameters model, we set the pair, (generation gap, mu- 
tation rate), at (0.8, 0.01) and (0.5, 0.05). The pair (0.8, 0.01) emphasizes the 
crossover and the pair (0.5, 0.05) emphasizes the mutation. And we set the 
threshold of Bias Kb at (0.9) and (0.8). Kb (0.9) emphasizes the carefully search 
in each subpopulation and Kb (0.8) emphasizes the global search by positively 
introducing migrants. We set 4 kinds of character on subpopulations by combin- 
ing these parameter sets, (generation, gap mutation rate, threshold of Bias Kb) 
as (0.8,0.01,0.9), (0.8,0.01,0. 8), (0.5,0.05,0.9) and (0.5,0.05,0.8). 

4.2 Results and Discussion 

Fig m and FigElshow the results of the first experiment. In the four subpop- 
ulations case, they converged early generations in the all experimental results. 
When the number of subpopulations increases and, at the same time, the number 
of the individuals in each subpopulation decreases, the convergence as the whole 
population becomes slow. Because the genetic construction of each subpopu- 
lation becomes various, since random genetic drift affects each subpopulation 
strongly. Therefore, our migration scheme is not suitable for the simple prob- 
lem such as R\. On the other hand, at the difficult problem such as i? 4 , our 
migration scheme is very efficient and good solution is obtained. Because the 
maintaining the diversity of the genetic construction prevents subpopulations 
from falling into local optimum solution. And furthermore, each subpopulation 
searches separate building blocks simultaneously, since subpopulations become 
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various by random genetic drift. Therefore, APGA is suitable for the problem 
which has the strong tendency of building block hypothesis. 

FiglHshows the result of the second experiment. The multiform parameters 
model improved the accuracy comparing the single parameter model. We think 
the reason of this result that multiform subpopulations are allotted different 
task and cooperate with each other effectively. Unfortunately, we did not make 
sufficiently investigation into multiform parameters model to substantiate the 
reason of the second experiment’s result, we need more consideration about 
these things. 

5 Conclusion 

In this research, we proposed Asynchronous Island Parallel GA where multiform 
subpopulations migrate asynchronously according to each situation. The effect 
on the combinational optimization problems was verified by applying our algo- 
rithm to Knapsack Problem and Royal Road Functions using parallel computer 
GRAY-T3E. Through these experiments, the following results were obtained. 

The migration scheme proposed in this research is effective for the combi- 
national optimization problems. Especially, when our algorithm is applyed to 
the problem, such as Royal Road Functions, which has the strong tendency of 
building block hypothesis, it performs effectively by the parallel search of the 
building blocks. In the search using multiform subpopulations, the search effi- 
ciency becomes worse, but the accuracy improves by maintaining the diversity 
of each subpopulation’s genetic construction. 
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Abstract. An efficient approach to solve multiple sequence alignment 
problem is presented in this paper. This approach is based on parallel 
genetic algorithm(PGA) that runs on a networked parallel environment. 
The algorithm optimizes an objective function ’weighted sums of pairs’ 
which measures alignment quality. Using isolated independent subpopu- 
lations of alignments in a quasi evolutionary manner this approach grad- 
ually improves the fitness of the subpopulations as measured by an ob- 
jective function. This parallel approach is shown to perform better than 
the sequential approach and an alternative method, clustalw. An inves- 
tigation of the parameters of the algorithm further confirms the better 
performance. 



1 Introduction 

Simultaneous alignment of many DNA or Protein sequences is an important tool 
in molecular biology. Multiple alignments are used to study molecular evolution, 
to help predict the secondary or tertiary structure of new sequences, RNA fold- 
ing and gene regulation. Basically, there have been two approaches to solving 
the problem of multiple sequence alignment: rigorous optimization by dynamic 
programming and heuristic algorithms. 

Extending dynamic programming for pairwise sequence alignment to multi- 
ple alignment of N sequences is limited to small numbers of short sequences [S] • 
It requires memory space for N- dimensional array and calculation time of the 
order of the Ath power of the sequence length. However, using Carrillo and Lip- 
man algorithm [3], the Multiple Sequence Alignment(MSA) program attempts 
to reduce the search space to a relatively small area [7] . Even with this reduction, 
it is limited to ten sequences at most. Therefore, all of the methods capable of 
handling larger problems in practical timescales make use of heuristics. 

The most widely used heuristic approach is the ’progressive alignment’ of 
Feng and Doolittle |3]. In this approach, the sequences are aligned in an order 
imposed by some estimated phylogenetic tree. It first aligns the most closely 
related sequences, gradually adding in the more distant ones. Some of the most 
widely used multiple alignment programs like ClustalW m, Mutal and Pileup 
are based on this algorithm. This approach has the great advantage of speed and 
simplicity as well as reasonable sensitivity. The main drawback of this approach is 
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the ’local minimum’ problem that stems from the greedy nature of the algorithm. 
Stochastic heuristics such as simulated annealing or genetic algorithm can be 
used to avoid this pitfall. 

Simulated annealing has been used for multiple alignment but can be very 
slow and usually only works well as an alignment improver [^. Genetic algo- 
rithms have been used to find globally optimal multiple alignments starting from 
completely unaligned sequences . The stochastic methods have the advantages 
of flexibility and lower complexity. They do not have any strong restrictions on 
the number of sequences to align or the length of those sequences. These methods 
are very flexible in optimizing any objective functions. 

Since genetic algorithm for multiple sequence alignment generally requires 
a long computation time it is desirable to use parallel genetic algorithms. We 
have developed a parallel genetic algorithm for multiple sequence alignment that 
runs on a networked parallel environment. We show that our parallel approach 
performs better than a sequential genetic algorithm when applied to multiple 
sequence alignment problem. 

The contributions of this paper are: 

— comparison of the performance of our algorithm with other sequence align- 
ment methods; 

— comparison of the alignment quality of our parallel approach with the se- 
quential genetic algorithm running under same constraints; 

— an investigation of the influence of some parallelization parameters on the 
alignment results. 

We will use the term PGA to describe an island genetic algorithm with iso- 
lated independent subpopulations. The rest of the paper is organized as follows: 
the second section describes the problem formulation of multiple alignment. The 
parallel approach is discussed in section three. The fourth section gives the im- 
plementation details and results. The last section draws the conclusion and sum- 
marizes the present work. 

2 Problem Formulation 

One can define a biosequence multiple alignment as an alignment of the residues 
of a number of nucleic acid or protein sequences where a gap character, is 
used as a spacer so that each sequence has the same number of residues plus gaps 
in the alignment. A column is a set of characters( residues plus gaps), one from 
each sequence, written one over the other. The multiple alignment is composed of 
the union of its columns. Each multiple alignment induces a pairwise alignment 
0 . 

Let Si, , Sk, be the input sequences and assume that K is at least 2. Let 
S be the input alphabet; we assume that S does not contain the character ’-’, 
so that a dash can be used to denote the gap in the alignment. Algorithms that 
construct multiple sequence alignments require a cost model as a criterion for 
constructing optimal alignment [7j. 
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In the simplest cost model there is a cost function sub: S x E N, where 
E be the input alphabet including the gap character — . It can be defined such 
that sub{a, b) is the cost of substituting a 6 in the second sequence for an a in 
the first sequence; also, sub{—, b) is the cost for columns where the first sequence 
has a gap and the second has a b, and sub{a, — ) is the cost for columns where the 
first sequence has an a and the second has a — . The cost of pairwise alignment 
Aij induced in a multiple alignment A of width w is 

1 k w 

With this, the basic weighted sum of pairs multiple sequence alignment problem 
is to minimize the pairwise sum 

i<j 



3 Description of PGA 

Genetic algorithms are highly suitable for parallelization and different ways exist 
to parallelize them m- Parallel genetic algorithm with isolated subpopulations 
or the island genetic algorithms is used to gain better problem solutions HU. 
The population is divided into a few subpopulations or demes, and each of these 
relatively large demes evolve separately on different processors. Exchange be- 
tween subpopulations is possible via a migration operator. A set of n individuals 
is assigned to each of N processors, for a total population size oi n x N . 

Initial subpopulations that consist of randomly constructed alignments are 
created at each processor. Each processor, disjointly and in parallel, executes the 
sequential genetic algorithm on its subpopulation for a certain number of gener- 
ations( migration interval). Afterwards, each subpopulation exchanges a specific 
number of individuals (migrants) with its neighbors. We exchange the individual 
themselves, i.e., the migrants are removed from one subpopulation and added to 
another. Hence, the size of the population remains the same after migration. The 
process continues with the separate evolution of each subpopulation for certain 
number of generations. 0. 

3.1 Characteristics of PGA 

Representation As DNA or Protein sequence alignment consist of both al- 
phabets and a gap character it’s better to represent alignments as it is so 
that there is no need for any encoding or decoding mechanism. With all input 
sequences, other important informations such as length of the sequences, number 
of sequences, score etc. are placed in a structure. Initial subpopulation at each 
processor is created randomly. It consists of a set of alignments containing only 
terminal gaps. Alignments are created by choosing a random offset for all the 
sequences and then moving each sequence to the right, according to its offset. 
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Fitness function The fitness of each individual in a subpopulation is calculated 
by scoring each alignment according to the ’weighted sum of pairs’ objective 
function. The overall alignment cost is calculated by adding a substitution cost 
and gap cost to each pair of aligned residues in each column of the alignment 
with their weights. We use pam250 substitution matrix and natural affine gap 
penalties for calculating fitness function p. Sequence weights are calculated 
using rationale 1 method developed by Altschul [^. 



Genetic operators In this parallel implementation we used all the operators 
of a sequential genetic algorithm approach for multiple sequence alignment. A 
detailed description of these operators can be found in |9] . 

Selection 

We use overlapping generation technique, where half of the population will 
survive unchanged, the other half will be replaced by the children during each 
generation. Individuals are ranked according to their fitness function, and the 
new children replace the weakest individuals. The expected offspring of an in- 
dividual is calculated from the fitness. It is used as the probability for each 
individual to be chosen as a parent. Parent are selected for breeding according 
to their expected offspring value in spinning wheel. 

Crossover 

A new alignment is created using crossover operator by combining two differ- 
ent alignment. Both, one-point crossover and uniform crossover are implemented. 
Two parent alignments are combined through a single exchange in one-point 
crossover. The first parent is cut straight at some randomly chosen position and 
the second one is tailored so that the right and the left pieces of each parent 
can be joined together while keeping the original order of the sequence of amino 
acids. Any void space that appears at the junction point is filled with null signs. 
This filling of null signs at junction point forces to design an operator that com- 
bines the properties of traditional crossover and those of mutation. The best of 
the two children produced in this way is kept in the population. 

The uniform crossover is designed to promote multiple exchanges between 
areas of homology. In both the parent, consistent positions are identified first. 
Two positions are said to be consistent if each column contain the same residue 
or a null coming from the same gap. Blocks between consistent positions are 
swapped to create a new alignment. 

Block shuffling 

A block of residues or a block of gaps can be moved inside an alignment 
using this operator. A set of overlapping stretches of residues from one or more 
sequences is called a block of residues. Each subsequence can be of different 
length but all subsequences must overlap. A block is chosen first by selecting one 
residue or gap position from the alignment and moved to a specified position. 
Gap insertion 

The sequences are split into two groups based on an estimated phylogenetic 
tree. A gap of randomly chosen length is inserted in each of the sequences of one 
group at a randomly chosen position. A gap of same length is also inserted into 
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all of the sequences of the second group at position that has maximum distance 
from the first gap insertion. 

Block searching 

Given a substring in one of the sequences, this operator finds the block to 
which it may belong in an alignment. Substring of random length at a random 
position in one of the sequences is compared with all substrings of the same 
length of other sequences. The best matching one is selected and added to the 
initial string forming a small profile. Then, best match is located and added to 
the profile for the remaining sequences. This process continues until a match is 
identified in all the sequences. 

4 Implementation and results 

The algorithm has been implemented on PARAM 10000, a parallel machine de- 
veloped at Center for Development of Advanced Computing(CDAC) |H]. The 
algorithm is implemented using C language with PVM standard. The results 
have been achieved with the machines running their normal daily loads in addi- 
tion to this algorithm. 

4.1 Comparison of PGA with other sequence alignment algorithms 

A set of four test cases were chosen from Pascarella structural alignment data 
base m . The results of PGA are compared with Clustal w, a well known 
multiple sequence alignment method m- The Clustal w algorithm is based on 
the greedy approach ’progressive alignment’. It does not explicitly optimize any 
objective function. Despite these limitations, by choosing an appropriate set of 
parameters, we evaluated Clustal w score in conditions where it would produce 
a result as close as possible to the optimization of the weighted sums of pairs 
objective function. 

The results are presented in Table 1 and show that in all four test cases PGA 
builds an alignment with the better score than Clustal w. PGA is implemented 
with five subpopulations of size 20 keeping the total population size 100. The 
run time of PGA is averaged over the runs that led to the presented results. 



Table 1. Comparison of PGA with CLUSTAL W 



Test Case 


Nseq 


Length 


CLUSTAL W 


PGA 


Score 


CPU Time 
(in secs) 


Score 


GPU Time 
(in secs) 


Dfr 


4 


186 


21316 


5.440 


21104 


110 


Gcr 


8 


48 


39103 


4.790 


38903 


636 


Globin 


15 


169 


4248160 


13.019 


4222846 


1178 


S protease 


15 


292 


26285660 


22.351 


25970235 


2815 
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4.2 Investigation on migration parameters 

The PGA alternates the maintenance of isolated subpopulations in different en- 
vironments with the introduction of individuals to new environment. The fitness 
values of the individuals in the subpopulations will be altered by migrating better 
individuals between subpopulations. Migration is based on various parameters, 
such as how often, how much, who, size and the number of neighbor subpopula- 
tions. To understand the specific effects of these parameters we have performed 
several experiments. All the results presented in Table 2 are normalized as the 
percentage exceeding the best score, with the percentage averaged over five runs. 
For comparison purposes, we also applied a sequential genetic algorithm(SGA) 
on the total population size. In all the experiments, the PGA and sequential ge- 
netic algorithm were run the same total number of generations. To demonstrate 
the importance of the migration, we also report the results achieved by PGA 
with ” 0 Migrants”. 



Table 2. Alignment Score with different nnmbers of migrants and migration interval. 
All results are averaged over five runs and normalized as percent exceeding the best- 
known score in Table 1. Thus, the smaller the value, the better the average alignment 
score 









Migration Interval 








25 gen. 


50 Gen. 


Test case 


SGA 


Mig 


Migrants 


Migrants 






0 


1 


2 


1 


2 


Dfr 


0.062 


0.046 


0.036 


0.037 


0.031 


0.035 


Gcr 


0.404 


0.272 


0.238 


0.247 


0.221 


0.242 


Globin 


0.163 


0.026 


0.027 


0.030 


0.017 


0.028 


S protease 


0.754 


0.528 


0.329 


0.348 


0.282 


0.320 


Average 


0.346 


0.218 


0.158 


0.166 


0.138 


0.156 


% SGA 


100 


63 


46 


48 


40 


45 



Number of Migrants and Migration Interval The influence of migration 
interval for different numbers of migrants is investigated. Better migrants were 
chosen, and sent to its right neighbor on a ring topology. Table 2 shows that 
the sequential approach is outperformed by all parallel variations, including the 
version without any migration. Thus, the splitting of the total population size 
into parallel evolving subpopulations increases the probability that at least one 
of these subpopulations will evolve toward a better result. 

Table 2 also shows that a limited migration between the subpopulations 
further enhances the advantage of the PGA. One migrant to each neighbor in the 
space of 50 generations resulted in the best parameters when averaged over all the 
test cases. On the other hand, small migration interval affects the performance 
of parallel evolving subpopulations. They reduce the genetic diversity between 
the subpopulations by searching in the limited space of almost same individuals. 
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Generations 



Fig. 1. The convergence of the best alignment score in the individual, parallel evolving 
subpopulations. Plotted are five runs with five subpopulation each, i.e. 25 curves with 
one migrant 



Figure 1 shows the convergence behavior of the best individuals in each of the 
parallel evolving subpopulations for the case of S Protease. All the results were 
obtained with five subpopulations of size 20 and with one migrants in the space 
of 50 generations. Plotted are five runs with five subpopulations each, i.e 25 
curves. The plot also indicates the importance of migration to avoid premature 
stagnation by inserting new individuals into a stagnating subpopulation. 



5 Conclusions 

We presented a PGA for multiple sequence alignment problem. We have shown 
here that the PGA outperforms the most widely used package Glustal W for 
all the test cases. The results showed that, when applied to sequence alignment 
problem, the PGA consistently performs better than a sequential genetic algo- 
rithm. A set of experiments has been performed in order to evaluate the effects of 
migration parameters on the PGA. As a result, a small number of migrants com- 
bined with ’moderate’ migration interval leads heuristically to the best results. 
This algorithm can be easily ported on any distributed networked environment. 
Work is in progress for finding new migration scheme based on the principles of 
bird migration and improving the performance of PGA using some local search 
algorithms. 
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Abstract. In this paper, an algorithm is presented for learning concept 
classification rules. It is a hybrid between evolutionary computing and 
inductive logic programming (ILP). Given input of positive and nega- 
tive examples, the algorithm constructs a logic program to classify these 
examples. The algorithm has several attractive features including the 
ability to explicitly use background (user-supplied) knowledge and to 
produce comprehensible output. We present results of using the algo- 
rithm to tackle the chess-endgame problem (KRK). The results show 
that using fitness proportionate selection to bias the population of ILP 
learners does not significantly increase classification accuracy. However, 
when rules are exchanged at intermediate stages in learning, in a manner 
similar to crossover in Genetic Programming, the predictive accuracy is 
frequently improved. 



1 Introduction 

This work addresses the classification problem in machine learning. That is, given 
training examples of the form {(xi,yi), . . . , (xm,?/^)} for some unknown func- 
tion y = /(x), where the x^ values are vectors of the form {xi^i,Xi^ 2 , ■ ■ ■ ,Xi^n), 
and y values are drawn from a discrete set of classes {1, . . . , A'}; find a definition 
of function / such that the y value for any Xj from the same distribution is 
accurately predicted, [1] . 

Evolutionary algorithms (EA) have in the past successfully been used for 
the classification problem. GABIL [3], and REGAL, |H], for example, create 
a mapping between logic expressions and fixed-length binary strings. A genetic 
algorithm then searches the space of strings. To evaluate strings they are mapped 
to the corresponding logical expression which may then be interpreted. Other 
work has been done on suitably modifying the operators in genetic algorithms 
to manipulate logical expressions directly, e.g. SAMUEL, [12], GLPS, [Trl] . 

However, as the hypothesis language becomes increasingly expressive, the 
space of logical expressions that needs to be searched grows combinatorially. 
There is accumulating evidence that indicates that the performance of evolu- 
tionary algorithms can be improved through the introduction of a local search 
method, HE). A local method progresses by refining an existing solution. Instead 
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of considering the entire search space, a small subset - the solution’s neighbour- 
hood - is examined. This can result in the rapid location of good solutions. 
However, if all the points in the neighbourhood are inferior, then the algorithm 
becomes trapped. Unless the local method is perturbed in some way, no fur- 
ther improvements can be made. Evolutionary algorithms are relatively robust 
against such local maxima, but are poor at local refinement. 

Furthermore, evolutionary algorithms do not easily lend themselves to using 
domain knowledge explicitly. Typically, they begin tabula rasa. The only real 
alternative is to seed, or bias the initial population of candidate solutions towards 
likely answers. However, this only makes certain classes of solution less likely, it 
does not allow solutions to be declared invalid. For example, Wong and Leung’s 
GLPS 1131 induces logic programs by evolving a suitably chosen representation 
such that the syntactic correctness will be guaranteed. However, the algorithm 
does not easily allow background knowledge to be used to constrain the space 
of solutions. The semantics of expressions are ignored. 

The aim of this work is to combine the local search properties of an in- 
ductive logic programming algorithm with the global search properties of an 
evolutionary algorithm. The algorithm presented in this paper uses a common 
language to express input and output, namely function-free Horn clauses. As a 
result knowledge can be supplied a priori in the form of rules and facts; this 
knowledge constrains the space of candidate solutions and thus eliminates from 
consideration solutions known to be inappropriate. 

The remaining sections of this paper introduce inductive logic programming; 
describe EVIL_1, a hybrid ILP-EA algorithm; and finally present an empirical 
study of learning performance on a chess-endgame problem. 



2 Evolutionary Inductive Logic Programming 

2.1 Inductive Logic Programming: a Brief Introduction 

Inductive logic programming (ILP), [^, is an approach to inducing concept de- 
scriptions that draws on the foundations of logic programming. The task tackled 
by ILP is that of developing predicate descriptions given training examples and 
background knowledge. More specifically, given sets of positive and negative 
£~ examples and relevant background knowledge B, construct an hypothesis Ti. 
that is consistent and complete with respect to the training data and background 
knowledge. 

The search through the space of logic programs is structured. A partial order- 
ing is imposed on the space of hypothesis clauses and this orders hypotheses by 
generality and allows large parts of the search space to be pruned. For example, if 
a clause C does not cover a positive example, then none of the specialisations of 
C need be considered. In practice, however, this strict condition must be relaxed 
in order to tolerate noise in training data. 

Inductive logic programming is an appealing local search method as it allows 
the easy incorporation of domain specific knowledge. Furthermore, it produces 
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comprehensible solutions. However, as ILP algorithms are typically based on the 
set covering algorithm, a greedy search algorithm, they are susceptible to local 
maxima. 

2.2 Evolutionary Algorithms 

Evolutionary algorithms are domain independent search algorithms inspired by 
principles of population genetics. Using very simple mechanisms, evolutionary 
algorithms exhibit complex behaviour that has been harnessed to solve some dif- 
ficult problems, e.g. |2I6| . However, evolutionary algorithms have certain draw- 
backs. Domain knowledge cannot be used easily. Furthermore, while such al- 
gorithms are good at establishing peaks in discontinuous multimodal objective 
functions, they have poor local search properties. 

2.3 Integrating the EA with ILP 

Inductive logic programming and evolutionary algorithms have appealing prop- 
erties which appear to be complementary. Evolutionary algorithms have good 
global search properties, whereas inductive logic programming algorithms have 
good local refinement characteristics. This provokes the question can a concept 
learning algorithm be constructed that captures both of these properties? 

Furthermore, when only a subset of the training set is seen by an ILP al- 
gorithm the theories induced are likely to be less accurate than if all the data 
had been used. However, in most real world applications, data will necessarily 
only become available gradually, or will be too great to use in one batch for the 
learner. It is therefore interesting to observe how algorithms perform when only 
samples of the data are used. 

2.4 The EVIL_1 Algorithm 

In the approach adopted, an evolutionary algorithm is used to direct the com- 
putational effort spent by multiple parallel instances of the ILP algorithm. The 
evolutionary algorithm maintains a population of agents each comprising of a 
logical theory (a logic program) . The traditional mutation and crossover opera- 
tors are replaced with the crossover operator used in El which in some respects 
is similar to crossover in Genetic Programming. 

Each agent is able to induce logic programs using an ILP algorithm. An 
agent takes as input a random sample of the training data and induces a theory. 
This theory is evaluated on a validation set. Those theories with poor predictive 
accuracy risk extinction, while those with high predictive accuracy are likely to 
occupy a larger proportion of the population in the next generation. As new rules 
are discovered they are added to the theory. This augmented theory together 
with the background knowledge is then used in subsequent trials. The fitness of 
a theory is measured by determining its predictive accuracy on the validation 
sel0 comprising of the entire training set. The algorithm is shown below. 

^ It should be noted that this is not to be confused with the test set which is used 
only for evaluation independently of any learning. 
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procedure EVIL_1 is 
begin 

initialise(Population) ; 

fitness = accuracy (Population, validation_set); 

while not termination_criterion loop 
for each member of population loop 
theory = select_parent (Population); 
subset = sample (training_set,sample_size); 
new .theory = induce(backgroundJt;nowledge, theory, subset); 
end loop 

htness = accuracy(Population, validation_set); 
new .population = select(population); 
population = new.population; 

end loop 

return httest(Population); 
end EVIL.l; 



The induce procedure refers to a call of the inductive logic programming 
algorithm. The algorithm chosen was Progol, m- 



3 An Empirical Study 

The aim of this empirical investigation is to examine the effect on predictive 
accuracy of (1) applying fitness proportionate selection on a population of ILP 
algorithms that only use a sample of the training set; and (2) of exchanging rules 
between ILP algorithms at intermediate stages. 

The task domain is the Chess Endgame (KRK) problem, proposed by [8]. 
This problem is a widely used test problem for ILP systems. The problem may 
be characterised as follows. There are three pieces left on a chess board: the 
white king, white rook, and the black king. The objective of the learning al- 
gorithm is to discover rules to describe illegal positions when it is white’s turn 
to move, given a set of positive and negative training instances. For example, 
an illegal position occurs when the black king is in check with white to move. 
The predicate illegal/6 is the target to be learned, and the attributes of the 
target predicate are file and rank for white king, white rook and black king 
respectively. Examples, therefore, take the form illegal(e,3,a, l,e, 1) and 

illegal (d, 4, g, 3, b, 5) (where denotes negation). These examples cor- 
respond to board positions as illustrated in Figure [U The data comprises of 
20000 examples which are split into 10000 training and validation instances and 
10000 test instances. 

The following background knowledge is also supplied. The adj/2 predicate 
defines cases where the rank or file represented by the left argument is adjacent 
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illegal(e,3,a,l,e,l). 



not(illegal(d,4,g,3 ,b,5)) . 



Fig. 1. Examples of legal and illegal positions. 



to that represented by the right argument. The It/ 2 predicate defines pairs of 
ranks or files, where the left argument is less than the right. Consequently, rules 
may take the form 

illegal(A,B,C,D,C,C) . 
illegal(A,B,C,D,E,F) :-adj(A,E). 

A population of 10 learning agents are supplied with subsets of the train- 
ing set and are allowed to induce a hypothesis using the Progol inductive logic 
programming algorithm. In each reneration, each agent is supplied with a 0.2% 
random sample of the training selo. Two experiments were conducted. 

The first experiment considers the effect of fitness proportionate selection. 
Two cases are considered: (1) multiple instances of ILP are run batch incremen- 
tally; and (2) also with fitness proportionate selection. The predictive accuracy 
on the test set was examined for both approaches. Figure |2] shows a scatter 
plot for the test set accuracy of the fitness-proportionate selection case (y-axis) 
against the no fitness proportionate selection case (x-axis). Points above the 
line y = X reflect an improvement in predictive accuracy for the introduction of 
fitness-proportionate selection. However, the distribution of points indicates that 
while the introduction of fitness-proportionate selection is not advantageous, it 
is also not disadvantageous. 

The second experiment examined the effect of rule exchange between learn- 
ers. At certain intervals denoted by the communication period u)c agents are 
selected to exchange parts of their clausal theory. In the control case ujc = oo, 

^ The choice of sample size was based on a trade-off between providing enough data 
for rules to be found while avoiding excessive run-times. 
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Fig. 2. Fitness-proportionate selection versus no fitness-proportionate selection. 

no rule exchange occurs. In the treatment cases, Wc = 3,5 or 10 generations. 
The following types of rule-exchange were considered. (1) Union: new theories 
are constructed by taking the union of two parent rule sets; (2) Crossover: new 
theories are constructed by exchanging rules using the crossover operator de- 
scribed in [Ill- 

Figure IHl shows scatter plots for the test set accuracy of rule exchange us- 
ing union, and Figure 2]the performance distributions for crossover. The graphs 
indicate that in the cases where the ILP algorithm performs badly, the introduc- 
tion of either union or crossover increases predictive accuracy. However, in the 
cases where the ILP algorithm already performs well, union and crossover have 
a detrimental effect. The statistical significance of these results were examined 
using the paired t-test. For union periods 3, 5 and 10, the introduction of union 
does not result in a statistically significant increase in predictive accuracy, and 
therefore it may not be reasonably asserted that union improves performance in 
the chess endgame problem. However, the exchange of rules through crossover 
with high crossover frequency does result in a statistically significant increase. 
uj^ = 3{P < 0.0005); = 5 (P < 0.05); ujc = 10 (P < 0.1). 

4 Conclusions 

A new hybrid evolutionary learning algorithm has been presented that induces 
first order logic clauses from examples. The algorithm has a number of attractive 
features. In particular, it allows the use of explicit background knowledge to 
constrain the space of solutions. In addition, the algorithm is inherently parallel 
and its output is sufficiently expressive to learn relational concepts. 

The algorithm’s learning properties were examined on the chess endgame 
(KRK) problem. It was shown that learning with a population of ILP learners, 
where fitness proportionate selection is used to bias trials towards good theories 
does not yield an increase in predictive accuracy. When rules are exchanged 
using a union operation a statistically significant increase is not observed. 
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However, when crossover is used to exchange rules between learners, then a 
significantly superior predictive accuracy is attained (P < 0.0005). 

One possible explanation for these results is that the ILP algorithm is based 
on the greedy algorithm which is susceptible to local minima. Crossover, to- 
gether with fitness-proportionate selection, serves as a global strategy, which 
can redirect the ILP algorithm to other areas of the search space. 

Areas currently being pursued include a more detailed analysis of rule ex- 
change between inductive learners and the application of evolutionary inductive 
logic programming to more complex problem domains. 
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Abstract. Genetic programming evolves Lisp-like programs rather than 
Hxed size linear strings. This representational power combined with gen- 
erality makes genetic programming an interesting tool for automatic pro- 
gramming and machine learning. One weakness is the enormous time re- 
quired for evolving complex programs. In this paper we present a method 
for accelerating evolution speed of genetic programming by active selec- 
tion of fitness cases during the run. In contrast to conventional genetic 
programming in which all the given training data are used repeatedly, 
the presented method evolves programs using only a subset of given 
data chosen incrementally at each generation. This method is applied to 
the evolution of collective behaviors for multiple robotic agents. Exper- 
imental evidence supports that evolving programs on an incrementally 
selected subset of htness cases can signihcantly reduce the htness eval- 
uation time without sacrificing generalization accuracy of the evolved 
programs. 



1 Introduction 

Genetic programming (GP) is a method for finding the most fit computer pro- 
grams by means of artificial evolution. A population of computer programs are 
generated at random. They are evolved to better programs using genetic opera- 
tors. The ability of the program to solve the problem is measured as its fitness 
value. 

The genetic programs are usually represented as trees. A genetic tree con- 
sists of elements from a function set and a terminal set. Function symbols ap- 
pear as nonterminal nodes. Terminal symbols are used to denote actions taken 
by the program. Since Lisp S-expressions can be represented as trees, genetic 
programming can, in principle, evolve any Lisp programs. Due to this powerful 
expressiveness, GP provides an effective method for automatic programming and 
machine learning. 

One difficulty in genetic programming is, however, that it requires enormous 
computational time. The time for evolution is proportional to the product of 
population size, generation number, and the data size needed for fitness eval- 
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uation. Typical population size for GP ranges from a few hundreds to several 
thousands |4]. A typical run requires fifty to hundreds of generations. The data 
size depends on the application. Fitness evaluation takes the most of evolution 
time in GP since it requires programs to be executed against fitness cases. 

In this paper we present two methods for reducing computational costs for 
genetic programming by evolving programs on a selected subset of given fitness 
cases. The idea of active data selection in supervised learning was originally 
introduced in 1991 by one of the authors for efficient training of neural networks 
mm . Motivated by this work Gathercole et al used training subsets for genetic 
programming m- Our approach is different from that of Gathercole et al. in 
that we increase the training set incrementally as generation goes on, rather 
than using the same number of fitness cases. The effectiveness of the presented 
methods was tested on a multiagent learning problem in which a group of mobile 
agents are to transport together a large table to the goal position. 

The paper is organized as follows. Section 2 describes the multiagent task. 
Section 3 presents the genetic programming approach with active data selection. 
Section 4 shows experimental results. Section 5 discusses the result. 



2 Evolving Multiagent Strategies Using Genetic 
Programming 



The table transport problem that will be used in our experiments is an example 
of multi-robot applications [9]. In an n x n grid world, a single table and four 
robotic agents are placed at random positions, as shown in Figure [TJ A specific 
location is designated as the destination. The goal of the robots is to transport 
the table to the destination in group motion. The robots need to move in herd 
since the table is too heavy and large to be transported by single robots. 




Fig. 1. The environment for multiagent learning. 
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Table 1. Terminals and functions of GP-trees for the table transport problem. 





Symbol 


Description 


Terminals 


FORWARD 


Move one step forward in the current direction 




AVOID 


Check clockwise and make one step in the first direction 
that avoids collision 




RANDOM-MOVE 


Move one step in the random direction 




TURN-TABLE 


Make a clockwise turn to the nearest direction of the 
table 




TURN-GOAL 


Make a clockwise turn to the nearest direction of the 
goal 




STOP 


Stay at the same position 


Functions 


IF-OBSTACLE 


Check collision with obstacles 




IF-ROBOT 


Check collision with other robots 




IF-TABLE 


Check if the table is nearby 




IF-GOAL 


Check if the table is nearby 




PR0G2, PROGS 


Evaluate two (or three) subtrees in sequence 



Each robot i {i = I, Nrobots) is equipped with a control program A^. If 
Ai ^ Aj for i j, then control programs are said to be private. In case of public 
control programs, all instances of Ai are constrained to be the same A. 

The robots activate ^i’s in parallel to run a team trial. At the beginning 
of the trial, the robot locations are chosen at random in the arena. They have 
different positions and orientations. During a trial, each robot is are granted a 
total of Smax elementary movements. The robot is allowed to stop in less than 
Smax steps if it reaches the goal. At the end of the trial, each robot i gets a 
fitness value which was measured by summing the contributions from various 
factors. 

The objective of a GP run is to find a multi-robot algorithm that, when exe- 
cuted by the robots in parallel, causes efficient table transport behavior in group. 
The terminal and function symbols used for GP to solve this problem are listed 
in Table [T] The terminal set consists of six primitive actions: FORWARD, AVOID, 
RANDOM-MOVE, TURN-TABLE, TURN-GOAL and STOP. The function set consists of 
six primitives: IF-OBSTACLE, IF-ROBOT, IF-TABLE, IF-GOAL, PR0G2 and PROGS. 
Each fitness case represents a world of 32 by 32 grid on which there are four 
robots, 64 obstacles, and the table to be transported. A set of training cases are 
used for evolving the programs. 

All the robots use the same control program. To evaluate the fitness of robots, 
we made a complete run of the program for one robot before the fitness of another 
is measured. The fitness value, fij{g)^ of individual i at generation g against case 
j is computed by considering various penalty factors. These include the distance 
between the target and the robot, the number of steps moved by the robot, the 
number of collisions made by the robot, the distance between starting and final 
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position of the robot, and the penalty for moving away from other robots. More 
details can be found elsewhere 0. 

The fitness, Fi{g), of program i at generation g is measured as the average 
of its fitness values fij{g) for the cases j in the training set: 

1 ^ 

F^{g) = gY.M9) ( 1 ) 

i=i 

where S is the number of fitness cases. 

In the following section we present the active data selection method for ge- 
netic programming. 



3 Genetic Programming with Incremental Data Selection 

With each program is associated a small set of initial training cases of size no, 
chosen from the base training set of size N. Individuals are evolved by the 
usual genetic programming. In addition, the algorithm has an additional step, 
i.e. incremental data inheritance (IDI), in which data sets are evolved. 

For the initial data population, a small subset of fitness cases, D{0), is chosen 
from the base training set of size N: 

D{0)CD^^\ |Z9(0)|=no. (2) 

After individuals are evolved by the usual evolutionary process (fitness evalu- 
ation, selection, and mating to generate offsprings), a portion of training set, 
A{g), is chosen from the previous candidate set C{g — 1) 

A{g)cC{g-l), \A{g)\ = \, (3) 

where C{g — 1) = — D{g — 1). And it is mixed with the previous training 

set to make a new training set D{g) for the next generation 

D{g) = D{g-l)UA{g), D{g - 1) n A{g) = {}. (4) 

That is, the sequence of training sets for GP active is 

D{0) C D{1) C D{2) C ... C D{G) = (5) 

where G is the number of maximum generation. 

We use a variant of uniform crossover to produce offspring data from their 
parent data. Two parent data sets, Di{g) and Dj{g), are crossed to inherit their 
subsets to two offspring data sets, Di{g -|- 1) and Dj{g + 1). In uniform data 
crossover, the data of parents’ are mixed into a union set 



F>z+j{g) = D,{g) U Dj{g), 



( 6 ) 
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Fig. 2. Uniform data crossover for data inheritance. 



which are then redistributed to two offspring: 



Di(g + 1) C Di+j(g) 

Dj{g+ 1) C Di+j{g) (7) 



where the size of offspring data sets are equal to rig+i = + A, where A > 1 is 

the data increment size. 

To ensure performance improvement, it is important to maintain the diversity 
of the training data as generation goes on. The diversity of data set Di{g) is 
measured by the ratio of distinctive examples: 



di = 



\D^+j{g)\ 

\DM\ 



0 < d* < 1 



( 8 ) 



where di = 0 if the parents have the same data and di = 1 if parents have no 
common training examples. To maintain the diversity, a portion p of the diversity 
factor di is used to import examples from the base data set. 



Ti = p • (1 - di), 0 < p < 1. 



( 9 ) 



For example, assume that the current parents have data sets, Di{g) and 
Dj{g), of size Ug = 40 each and \Di+j{g)\ = 60. Let the parameters be p = 0.3, 
A = 3. Then, we need to generate two training sets of size rig+i = rig + A = 43 

for the offspring (Figure 1^. The diversity is di = 1 = 1. 5 — 1 = 0. 5 

and the import rate is ri = p • (1 — di) = 0.3 • (1 — 0.5) = 0.15. The data for 
each offspring is generated by randomly choosing 34 examples from Di^j{g), 6 
examples from and again A = 3 examples from Figure [2l illustrates 

this process. 
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4 Experimental Results 

Experiments have been performed using the parameter values listed in Table 
E] The terminal set and function set consist of six primitives, respectively, as 
summarized in Table [H A total of 100 training cases were used for evolving the 
programs for standard GP runs. GP runs with active data selection used 10 + 3(; 
examples out of the given data set, i.e. no = 10, A = 3, for fitness evaluation. 
For all methods, a total of 100 independent worlds were used for evaluating the 
generalization performance of evolved programs. 

We compared the performance of the GP with active data selection to the 
GP with random data selection. Results are shown in Figure [3l GPs with IDI 
and incremental random selection (IRS) achieved better than GP without active 
selection (GP standard). Figure [1| shows the fitness of three methods with repect 
to the total number of evaluations. Since the GP with active data selection uses 
variable data size, we calculated the number of evaluations at generation g by 
a product of the population size and the data size at generation g. The active 
GP methods achieved a speed-up factor of approximately two compared with 
that of the standard GP. The results are summarized in Table El Though the 
GP with active data selection methods used a smaller set of fitness cases, its 
training and test performance were slightly better than those of the standard 
GP. Though further experiments are necessary for more definite conclusion, it 
seems that the active GP has a potential to evolve smaller programs than the 
standard GP since small data usually tends to require smaller programs. This 
seems interesting from the Occam’s razor principle point of view PSE]. 



Table 2. Parameters used in the experiments. 



Parameter 


Value 


Population size 


100 


Max generation 


30 


Crossover rate 


0.9 


Mutation rate 


0.1 



Table 3. Comparison of time and average fitness values (lower is better) for the stan- 
dard GP and the GP with active data selection. The values are averaged over ten runs. 
Also shown are the standard deviation. 



Method 


Time 


Average Fitness 




Training 


Test 


GP standard 


300000 


211.21 ± 9.19 


225.64 ± 12.05 


GP with IRS 


170500 


209.60 ± 7.67 


219.91 ± 13.11 


GP with IDI 


170500 


195.97 ± 8.41 


203.39 ± 10.78 
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Fig. 3. Comparison of fitness values as a function of generation number. 
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Fig. 4. Comparison of fitness values as a function of the number of function evaluations. 

5 Conclusions 

We have presented a method for accelerating evolution speed of genetic program- 
ming by selecting a subset of given fitness cases. Since the fitness evaluation step 
is a bottleneck in GP computing time, this method can make an essential con- 
tribution to improving the GP performance. 
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Experimental results have shown that by reducing the fitness cases the evo- 
lution speed of GP can be enhanced without loss of generality of the evolved 
programs. This is especially true for problem settings in which a large amount 
of fitness cases are available. In this case, the active data selection can exploit 
the redundancy in the data, while the standard GP blindly re-evaluates all the 
fitness cases. 
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Abstract. A novel obstacle avoidance and a final position and orienta- 
tion acquiring methods are developed and implemented for fast moving 
mobile robots. Most of the obstacle avoidance techniques do not con- 
sider the robot orientation or its final angle at the target position. These 
techniques deal with the robot position only and are independent of its 
orientation and velocity. To solve these problems we propose a novel 
uni- vector field method, which introduces a normalized two-dimensional 
vector field for navigation. To obtain the optimal vector field, a func- 
tion approximator is used, and is trained by evolutionary programming. 
Two kinds of vector fields are trained, one for the final posture acqui- 
sition, and the other for obstacle avoidance. Computer simulations and 
real experiments are carried out to demonstrate the effectiveness of the 
proposed scheme. 

Keywords: Navigation, Wheeled mobile robots. Uni-vector field method. 
Evolutionary programming. Soccer robots. 



1 Introduction 

Navigation with obstacle avoidance is one of the key issues to be looked into for 
successful applications of autonomous mobile robots. Navigation involves three 
tasks: mapping and modeling the environment, path planning and selection, and 
path following. The traditional navigation method separates path planning and 
following, into two isolated tasks. In contrast, in the unified navigation such as 
potential field method, these two steps are unified in one task [T]. 

Conventional navigation methods do not consider the robot orientation and 
its final angle at the target position. For instance, when a robot dribbles a 
ball in a robot soccer game |2I3| or pushes a load in an industrial field, it is 
very important to acquire the final robot orientation. Using the conventional 
methods, a robot has difficulties in performing such tasks. Moreover, in the path 
planning step, the generated path ignores the mechanical properties of the robot. 

In this paper, a novel uni-vector field method is proposed for the unified 
navigation considering the kinematic properties of the robot and the practical 
application to the fast moving mobile robots. To obtain the optimal uni-vector 
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field, a function approximator and its learning algorithm by evolutionary pro- 
gramming (EP) are proposed. By introducing the uni-vector fields, the perfor- 
mance of the unified navigational approach is improved along with the obstacle 
avoidance capability. The developed navigation is implemented on a bi-wheel 
type mobile robot designed for MiroSot [^. 

In Sections [2l the kinematic properties of bi-wheel type mobile robot are 
discussed. In Section|3] a novel uni- vector field navigation method is described as 
a unified navigation approach, based on EP. SectionsO and 0 describe computer 
simulations and experimental results, respectively. Concluding remarks follow in 
Section E] 



2 Modeling of a Mobile Robot 



In this paper, two wheeled mobile robots with non-slipping and pure rolling are 
considered [4|. The mechanical structure of the mobile robot is shown in Fig l(a)| 
The kinematics of the robot can be described using Fig. 1(b) Posture Ps and 
position p of the robot are defined as 



Xc 




_ _ 






Xc 


Vc 


II 


y<^. 







( 1 ) 



where (xc,yc) is the position of the center of robot, and 9c is the heading angle 
of the robot with respect to absolute coordinates. Velocity vector S is defined 
as follows : 



V 




[Vr + VlI 
2 


OJ 




Vr-Vl 

L L J 




Vl 

Vr 



( 2 ) 



where v is the translational velocity of the center of robot and uj is the rotational 
velocity with respect to the center of robot. Equation © shows the relation 
between the velocity vector and the velocities of two wheels, Vr and Vr, where 
Vr is the left wheel velocity, and Vr is the right wheel velocity. 




(a) Shape of the robot 



(b) Robot modeling 



Fig. 1. Shape of bi- wheel type mobile robot and its modeling 
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3 Uni- vector Field Navigation Method 

Potential field method is generally used in robot control. As it is very simple, it is 
possible to control robots in real time. However, when the regular velocity cannot 
be maintained or the obstacle is big, the robot is liable to get into oscillation, 
and its direction cannot be guaranteed at an arriving position [Sj . These are due 
to the approximation of the robot to a point mass. By potential field navigation 
the robot moves in the direction proportional to a resultant force comprised 
of an attractive force from the desired position and a repulsive force from the 
obstacle to be avoided. The resultant force can be considered as a vector field. 
It is possible to control the robot better with a modified vector field, if we can 
find the optimal one. In this paper, we introduce a uni- vector field in which the 
magnitude of vectors is a unity at all the positions. 

3.1 Uni-vector field generation 

A uni-vector field, N for the robot navigation is defined as 

N : F ^ I ( 3 ) 

where F is the workspace of the robot in and / is a set of unit vectors with 
arbitrary direction. While controlling the robot, these unit vectors correspond 
to the desired robot heading directions. As normalized vectors are used, the 
uni-vector field N can be represented in terms of its directions, as follows: 

On ■ F ^ [-7T,7r]. ( 4 ) 

Fig.[l shows an example of the uni-vector field for a desired posture at a point 
g, where the points and straight lines are uni- vector field, and the trajectories of 
rectangles are simulated robot paths. The uni- vector field at position p is defined 
as N{p). It is assumed that the magnitude of vectors in the field is a unity at all 
points. The angle of the vector at a robot position p is generated by 

9n{p) = -pg - ncj) ( 5 ) 



with (j) = _pr — _pg 

where n is a positive constant. The shape of the field and the turning motion of 
the robots vary as per the parameter n and the distance between points g and 
r. By this equation, we can obtain a uni-vector field at all points for the desired 
posture at point g. This uni-vector was implemented to the robot soccer system 
|3] for kicking motion, where the point g was the ball position and the heading 
position r was adjusted to the desired kicking direction. As shown in Fig. [21 there 
are inefficient properties in this heuristic uni- vector field. For example, the robot 
behind the point g follows the long path to approach the final posture. 

In order to exploit the vector field N for robot control with better per- 
formance, the field has to be adjusted efficiently. A function approximator is 
introduced in order to achieve the same. 
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Fig. 2. Heuristic uni-vector field for final posture at a point g 



To start with, a grid of size n x m is located within the workspace as shown 
in Fig. 3(a)] The shape and density of the grid net can be varied as per the 
application and the desired accuracy, pij is the position of node (i,j) and Nij 
represents the field vector at pij. 

The set of field vectors Nij forms an n x m matrix, {Nij\l < i < n,l < j < 
m}. The vector associated with an arbitrary position P in F, is calculated with 
the function approximator as follows: 



N{p) = 



{dbdcdd^ ~\~ {dadcdd^ {dadbdd^ At^_|_r j -t - -irl 

dbdcdd + dadcdd + dadbdd + dadbdc 



( 6 ) 



with da = \\p - Pi+ij+i\\, db= \\p- Ptj+i\\, 

dc= \\p- Pz+i,j\\, dd=\\p-Pt,j\\ 

where pij, Pij+i-, Pi+ij and are the positions of the nodes surrounding 

the point p as shown in Fig. |3(b)] N(p) in m represents an intermediate vector 
for the Nij, and vectors. As p approaches Pij, N{p) 

converges to Nij. 

Consequently, by setting the elements of the matrix {Ni j\l < i < n,l < j < 
m} to each of node values, all the vectors in field N can be fully determined. In 
Section El the training of the vector field N is discussed. 




(a) 



(b) 



Fig. 3. Grid net of the function approximator 
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3.2 Uni-vector field tracking controller 

To apply the vector field method for navigation, a field tracking controller is 
required. The control inputs to the wheels reduce the error in angle between the 
robot heading direction and the field vector. The error in angle 9^., between the 
robot heading angle 9c and the vector orientation 9n is given as follows: 

9c = 9c- 9 n. (7) 

Let us employ the following rotational velocity: 

u = G{xc, Vc, Oc) I IpI I + K^sgn{9c)\/\^\ (8) 

with G{xc, Vc, Oc) = ^cos 9c + ^sin 9c, 

where is a positive constant and sgn is a sign function. Then, 9g will become 

2 i'Q'S 

zero within a time T > [B]. G{x,y,9c) is the product of the gradient 

of and a unit vector in the direction of 9c- In other words, it means the 
variation of in the direction of 9c, at the current position {x,y). The term 
G{x, y, 9c) I Ip'cI I refers to the variation of 9n of robot center in unit time. Equation 
m represents a kind of sliding mode controller. 



3.3 EP and the learning algorithm 

To control a fast mobile robot, many conditions should be satisfied, which are 
difficult to represent. Researchers focus on some of them, based on their interest 
and the application field. To optimize such a complex system, EP is an efficient 
tool. The evaluation function is decided based on the elapsed time, the distance 
from target position, the distance from obstacle, the final orientation of robot 
heading, and the maximal rotational acceleration. 

These criteria are merged to form an evaluation function f{x) as follows: 

f(x) = ktts + kd\ 9c{ts) - 9d\ + ft{x) + fo(x) + fa(x) (9) 



where tg is the elapsed time, and 9^ is the desired final direction. The evaluation 
function is used for learning the uni- vector field matrix {Nij\l < i < n,l < j < 
m}. 

The first term in the evaluation function helps the robot to reach the target 
point without wasting time, and the second term forces the robot heading to 
converge to the desired final direction 9d- The function ft{x) makes the robot to 
move to the target point: 



ft{x) 



0, if arrived at pg 

Tp mint [o,t.]( \p{t) ~Pg\), otherwise 



( 10 ) 



where p{t) is the position of the robot center at time t, pg is the target position 
and Tp is a penalty value that is added when a robot does not arrive at pg. If 
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the robot does not reach the target position, the evaluation function increases 
depending on the distance from the robot center to the target point, and the 
corresponding value mint [o,ts]( \p{i) ~Pg\) added with Tp, gives the ft{x) value 
as in equation m- The function fo{x) prevents the robot from colliding with 
an obstacle: 



fo{x) 



0,when not in collision with an obstacle 
Bp + maxt <f( \p{t) — Pb\ ), otherwise 



( 11 ) 



where Bp is a penalty value, C [0, t^] refers to the time interval during which 
the robot is within an obstacle boundary, and pb is the closest point on the 
obstacle boundary from the robot center. When a robot collides with an obstacle, 
the fo{x) function is calculated going by projecting the robot trajectory nearest 
to the obstacle center. The shortest distance of such a point from the periphery 
of the obstacle is used in getting the value of the fo{x) function. The function 
fa{x) makes the robot rotational acceleration, to not to exceed its limit Umax'- 



fa{x) 



0, when to is within the limit Umax 

Ap + maxt |w(f) - amax \ ), otherwise 



( 12 ) 



In computer simulation, the penalty values Gp, Bp, and Ap are taken as 500, 
100, and 50, respectively. The value of Gp is greater than the sum of the other 
two terms in evaluation function, as we assumed that the arrival at the target 
point is the most important condition to be satisfied in robot navigation. 

In the EP algorithm, self-adaptive Gaussian mutation is used. For details 
on constrained optimization by evolutionary algorithms, the reader is referred 
to jZ]. 



4 Computer Simulations 

Computer simulations were carried out using two kinds of vector fields( for final 
posture and obstacle avoidance) on a Pentium IBM PC considering the kine- 
matic model of the robot. For each individual, the simulation is carried out 
25 times with uniformly distributed random starting positions. Throughout the 
simulations, the elitist {{p,+ A) — EP) selection method was used and the grid 
nets used in following simulations have circular form with size 10 x 6. The maxi- 
mum speed of wheels was 100 cm/s, of the center of the robot was 50 cm/s, and 
the maximal rotational acceleration of robot was 10 rad/sample. The number of 
individuals was 20, and the number of offsprings was 40. 



4.1 Uni- vector field for final posture 

In this case. It was assumed that the final position is the cente r of the f ield (0,0) 
and the final orientation is to the right (0 rad). Fig. 4(a) 4(b)| and |4(c)| show the 
best vector fields obtained at each generation. Fig. 4(c)| shows the constraints are 
satisfied or traded off. 
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(a) 10th generation (b) 50th generation (c) 500th generation 



Fig. 4. Simnlation results for final postnre acqnisition 




(a) 10th generation 



(b) 50th generation 



(c) 500th generation 



Fig. 5. Simulation results for obstacle avoidance 



4.2 Uni-vector field with obstacles 

Fig. 5(a) 5(b) and |5(c)| show the results for circular obstacle avoidance. The 
final position is to the right border of each frame. As the generation goes on, the 
navigation becomes more satisfactory. 



5 Experiments 

To demonstrate the effectiveness of the proposed scheme, it is implemented in 
the real robot system. The overall system is composed of a robot, a host com- 
puter, a vision system, and a communication system. The vision system detects 
the position and orientation of the robot and obstacle. Using this vision informa- 
tion, the host computer applies the proposed navigation method to calculate the 
velocities of the robot wheels. The calculated wheel velocities are transmitted to 
the robot through the communication system. 

The vision system is composed of a TMC-7 CCD camera with a resolution 
of 320x240 pixel and an image grabber with a processing rate of 30 frames/sec. 
The vision system in the experimental setup has measurement errors of about 
2.4cm for position and 4.83 degree for angle calculations. The host computer 
is a Pentium processor with 133 MHz clock. The mobile robot is developed 
for the purpose of playing MiroSot robot-soccer game |3- The robot size is 7.5 
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Fig. 6. Experimental results for desired robot postures 



cmx7.5 cmx7.5 cm, with a wheel width of 6.5 cm. The robot has an AT89C52 
micro-controller, two DC motors, and two LM629 motion controllers. In the 
experiment, a sampling time of 33 ms is used. Other conditions are the same 
as in simulation. Fig.|^(a) shows the case of final posture acquiring without an 
obstacle. Fig.E] (b), (c), and (d) show the cases with obstacle avoidance. The 
radius of the obstacle considered is 6cm. The arrows in Fig. El show the desired 
final orientation of the robot. The robots move to the final position and converge 
to the final heading angle by the short and smooth path without collision. Fig.E] 
shows good performance for all cases. 

6 Conclusions 

A novel navigation method for obstacle avoidance and final posture acquiring 
method were developed and implemented. This method was obtained introducing 
a modifiable uni-vector field into the unified navigation. To obtain the optimal 
vector field, a function approximator and its learning algorithm were proposed. 
The developed navigation was implemented on a bi- wheel type mobile robot. As 
seen from both simulations and experiments, the proposed method is useful for 
fast mobile robot control and for robots performing more complex tasks. 
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Abstract. In this paper, we consider a society of economic agents. Eco- 
nomic agents are defined as autonomous software entities equipped with 
the adaptive functions. They have their own adaptive functions defined 
over the market which is governed with the market mechanism. We 
especially focus the evolutional explanation on how the social compe- 
tence that provides the motivation for the coordinated behavior can be 
emerged from the interactions guided by the selfish behaviors of economic 
agents. Especially we need to understand the following basic issues how 
get the architecture of an agent, as a component of a complex system, 
suited for evolution, how self-interested behaviors evolve to coordinated 
behaviors, and how the structure of each goal (adaptive) function can 
be modified for globally coordinated behaviors. We also show that the 
concept of sympathy becomes a fundamental element for adaptation and 
coordinated behavior. 

Keywords: emergence of optimal behaviors, market mechanism, eco- 
nomic agent, emotion 



1 Introduction 

In a large-scale complex adaptive system composed of those many rational 
agents, two types of strategic behaviors may occur: agents mutually interact 
and behave to achieve the common goal of the society, while at the same time, 
each agent also behaves to optimizes its own goal. For an individual rational 
agent, it behaves to improve its own adaptive function based on its local ob- 
servation. This ability is based on principles of the individual rationality. By a 
social goal we mean a goal that is not achievable by any single agent alone but 
is achievable by a society of agents. The key element that distinguishes a so- 
cial goal from an agent’s individual goal is that they require cooperation. Then 
how will the evolution of individually rational behaviors proceed to coordinated 
behavior? 

We describe the model of economic agent as the basis for social cooperation 
learnable through competitive interaction. We call the latter ability as compet- 
itive cooperation. Economic agents are driven by their own selfish motivations, 
and they are selfish in the sense that they only do what they want to do and 
what they think is in their own best interests, as determined by their own in- 
terests. The collective behavior of those agents is determined through the local 
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interactions of their constituent parts. These interactions merit careful study in 
order to understand the macroscopic properties of collective behaviors . We espe- 
cially ask the following questions: If agents make decisions on the basis imperfect 
information about other agents’ goals or adaptive funcitons, and incorporate ex- 
pectations on how its decision will affect other agents’ adaptive functions, then 
how will the evolution of cooperation proceed? How will the structure of adap- 
tive function of each agent should be self-modified for the evolution of social 
cooperation? 

In this paper, we study the evolution of social cooperation without loosing 
the principle of competition in a society. Social learning is defined as the set 
of mechanism which utilizes adaptive decision-making of economic agents. In 
the adaptive decision-making mechanism, each economic agent modifies its own 
adaptive function by reflecting its sympathy level to the other members. With 
the principle of sympathy. It adapts its own decision to based on the current 
and previous performance. The goal of an agent is determined solely by how 
that agent affects other members of the society and how the decision of other 
agents affects its own adaptive function. Adaptive decision-making is facilitated 
by designing the agents to be somewhat modified selfish interest. 

Social learning allows agents to achieve high-level social goals without the 
need for cooperative planning and communications. Thus, over a time, a soci- 
ety of economic agents will be able to learn to cooperate together at an even 
higher-level learning of efficiency and adaptability. Under this social learning, 
cooperation emerges as a side-effect of the adaptive decision-making which lead 
them to learn the cooperative behaviors without sacrificing that the principle of 
competition in the society. 



2 A Model of Economic Agents 

We consider a society of economic agents, G = {A : z = 1, 2, . . . , n}. Economic 
agents are defined as autonomous software entities equipped with the adaptive 
capabilities. They have their own adaptive functions as the function of the market 
price which is governed with the market mechanism. We define the adaptive 
function of each agent Ai G G a,s 

Ui{xi , . . . , a;*, . . . , x„) = XiPi{xi,x{i)} (1) 

where 2 ^(*)}i' 6 Pi' 6 sents the price scheme associated to the activity of agent 

Ai G G. And Xi represents the level of activity of agent Ai G G, and x{i) = 
{x \, . . . , Xi-i,Xi+i , . . . , Xn) represents the set of activities of all agents in G ex- 
cept agent Ai. 

As a specific example, we consider the following social price scheme for each 
agent Ai G G, 

n 

Pi — (Xi ^ ^ bijXj 

i=i 

where ai, bij,i,j = 1,2,..., n,are some positive constant. 



( 2 ) 
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The competitive solution in which each agent maximizes its own adaptive 
function simultaneously is given as the solution of the following system of linear 
equations: 

{B + Bi)x = a (3) 

where B is a, n x n matrix with the (i, j)th element is, 6^ , i, j = 1, 2, . . . , n, Bi 
is a diagonal matrix with the i-th diagonal element is bn, and a are the column 
vectors with the elements, ai, ,i = 1,2, ... ,n, respectively. 

We define the socially optimal behavior as the set of the activities that opti- 
mize the summation of the adaptive functions of all agents defined as 

n 

S{xi,. ..,Xi,...,Xn)='^ Ui{xi,. ..,Xi,.. .,Xn} (4) 

i=l 

The socially optimal behaviors is then obtained as the set of the activities sat- 
isfying the following equations. 

n 

dS/dxi = dUi/dxi + '^dUj/dxi = 0, i=l,2,...,n, (5) 

3=i 

As an example of the quadratic adaptive functions with the linear social price 
scheme is given in |[2D, the social optimal solution is obtained as the solution of 
the following system of linear equations: 

{B + B'^)x =a (6) 

where B^ is the transpose matrix of B. 

We especially consider the case in which the interaction matrix B is sym- 
metric with the diagonal elements are the same, i.e., ha = d and the off-diagonal 
elements are, bij — b, (0 < b < d),i, j = 1,2, ... ,n. The column vector a also has 
the same elements, i.e., Ui = a,i = 1,2, ... ,n,. 

The level of adaptation of each agent at competitive equilibrium is obtained 
as follows: 

U,(n) = a‘^d/{2d+b{n-l)y (7) 

The level of the adaptation as a society, which is defined as the summation of 
the adaptive level of each agent is then given as 

n 

G (n) = ^ C/j (n) = a?dn/{2d + b{n — 1)}^ (8) 

i=l 

The level of adaptation of each agent at socially optimal behavior is given as 

U^in) = a^/4:{d+b{n- 1)} (9) 

The levels of the adaptation as a society, which is defined as the summation of 
the adaptive level of each agent is then given as 

n 

G (n) = ^ (n) = a^n/4{c? -|- b{n — 1)} 

i=l 



( 10 ) 
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Here we are interested in how the adaptiveness of the whole organization may 
affect if the size of the organization increases, i.e, we will investigate the asymp- 
totic value of the summation of the adaptive functions of each agent in the case 
that the number of agents increases. By taking the limits of those social adap- 
tive functions with the number of the economic agents, those values converge as 
follows: 

lim G (n) = 0 (11) 

n 

lim G (n) = a‘^/4b (12) 

n 

This implies that the level of adaptation under competitive behaviors converges 
to zero, and that of under socially optimal behaviors converge to same constant. 

3 Learning of Social Adaptive Function 

In the previous section, we showed that the conditions of the individual opti- 
mality and the social optimality are different. This implies that if each economic 
seeks its own optimality the level of adaptation decreases as the number of agents 
in a society increases. Our question is then stated as follows, how will the evo- 
lution of cooperation proceed and how the emergence of cooperation can take 
place in a society. 

We now consider the following modified adaptive function for each agent 
Ai € G. 

Ui{xi, x(i)} = Ui{xi, x(i)} - At{x(i)}xi (13) 

The adaptive function of each agent defined in (|l6l) consists of the two terms, 
the private adaptive function and the social adaptive function. By taking the 
derivative of the modified adaptive function of (|l5l) by Xj, we obtain 

dUi/dxi = dUi/dxi- \i{x{i)) (14) 

we set Xi{x{i)) in ( fill ) as ( [T^ , the condition of the individual optimality un- 
der the modified adaptive functions is equivalent to the condition of the social 
optimality defined over the set of the original adaptive functions in (□■ 

n n 

A,(x(z)) = - Y^{d^Uj/dxjdx,)xj = ^ bjiXj (15) 

j=i j=i 

We term Xi{x{i)) as the level of the symphathy of the i-th economic agent. The 
symphathy level indicates the level of the influence of the decision of i-th agent 
to the adaptive functions of the other economic agents. 

The condition of the individual optimality by considering the sympathy level 
is given as 

Mi{xi,x{i)} - Xi{x{i)} = Q i = l,2, ...,n. (16) 

where we denote the derivative of the adaptive function as Mi{xi,x{i)'\. 

The emergence of those social competence as intelligent can take place with- 
out any commitment among selfish agents. In a society, economic agents are 



166 



Masayuki Ishinishi and Akira Namatame 



driven by their own selfish motivations which lead them to learn the rules of 
decentralized decision-making or the coordinated behaviors. 

The process of building up intelligent behaviors and cooperative intentions 
may be called mutual or social learning mm- Social learning from the social 
perspective is grounded in the actions of many agents’ activities taken together, 
and it not a matter of individual choice. It is one’s actions in relation to those 
of others (vice versa) that maintain its participation. Social learning is in this 
sense is the outcome of a web of activity emerged from the mutual interactions 
among agents. In the model of social learning, two types of learning may occur: 
the economic agent can learn to cooperate as a group, while at the same time, 
each agent can also learn its own by adjusting its activity level. Social learning 
would require the exchange of actions of the other agents. 

The dynamic action selection process must be coordinated to achieve globally 
consistent and good actions. We define the social learning as the adjustment pro- 
cess of each agents’ individually economic behavior. The social learning model 
describes how each agent, without knowing the others’ adaptive functions, ad- 
justs its activity level over time and reaches to an equilibrium situation. 

Without complete knowledge of other agents, agent needs to infer the strate- 
gies, knowledge, plans of other agents. Economic agents can put forward their 
private knowledge for consideration by other agents based on its own local in- 
teractions, and agents would require the exchange of actions with other agents. 
Learning is then formulated as the web of activity emerged from the mutual in- 
teractions among economic agents. With the individual learning capability, each 
agent modifies its decision based on the current and previous performance in or- 
der to optimize its own adaptive function[2. This adjustment process generates 
a partial action that governs the actions of the agents .The mutual adjustment 
process of behaviors is modeled as follows: 

Mi{xi,x{i)} > Xi{x{i)} then Xi := Xi + 6xi 
Mi{xi,x{i)} < Xi{x{i)} then Xi := Xi — 6xi 

At equilibrium , we have 

M^{Xi , X (t)} - Ai{a; (i)} = 0 i = 1, 2, . . . , n. 

The use of directives by an agent to control another can be viewed as a form 
of incremental behavior adjustment |14|. The adjustment process without any 
sympathy by setting Ai = 0, * = 1, 2, . . . , n., converges a competitive equilibrium. 

The mutual adjustment process with the sympathy is modeled specifically as 
follows: 

Si = Xi{t + 1) - Xi{t) 

= {aJbu)[Mi{x^{t),x{i,t)} - X^{x{i,t)}] '' 

where x{i,t) = {x\{t), . . . ,Xi-i{t),Xi{t),Xi+\{t), . . . ,Xn{t)). The mutual ad- 
justment process is then given as follows: 



(17) 

(18) 



Xi{t -t- 1) = {ai/bu)Pi{t) + (1 - ai)xi{t) - {ai/bti)Xi{x{i, t)} (20) 
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We also describe the adjustment process of each agent’s symphathy level as 
follows: 

+ 1) = l3i[M^{xi{t),x{i,t)} - Xix{i,t)\ + \i{x{i,t)} (21) 

With the definition the level symphathy in m, we have the following process 
for learning: 

n 

Xi{t + 1) = (3i{P^{t) - biiXt{t)} + (1 - f3i) ^ hjiXjit) (22) 

j=i 

The activity level of each agent should be determined solely by how its de- 
cision affects other members in the same society and how the decisions of other 
agents affect its own adaptive function. However, in the large society, it may 
difficult for each agent to consider the interactions with other agent. Therefore, 
we assume the following symmetric condition for mutual interactions. 

bji/bu = k (0 < fc < l),j = l,2,...,n,j * (23) 

The mutual adjustment process of behaviors based on goal-seeking with the 
sympasy is then modeled as follows: 

n 

A,(t-k 1) = Pi{Pi{t) - buXi{t)} + {1 - P,)biik'^Xj{t) (24) 

j^i 

4 Some Simulation Results 

The goal of the research is to understand the competitive interactions based 
on the self-interested motivations which produce purposive and optimal collec- 
tive behavior. In this section, we address the question of how a society of the 
economic agents with different internal model can achieve complex collective 
behaviors as a whole. We especially address the following questions: How will 
the internal model of each economic agent affect the evolution of their collec- 
tive behaviors, how will the collective behavior of in economic agents proceed 
by changing the combination patterns of different types of agents? In order to 
answer those questions, we did some simulation under the following condition. 

(simulate conditions) 

(1) number of Agents:30 

(2) social price scheme :Oi = 300, 6^; = l,bij = 0.1 

(3) initial action of each agentoei?!’oe(i35 

The following figures show the change of adaptation level over the adaptive 
time. 

(Casel ) tti = 0.1(slow to adjust the market price) and (3i = 0.1(low learning 
speed) 

FigJUand FigElshows the level of adaptation of an individual and the whole 
society. Fig[T] shows the level of adaptation under sympathy. Fig[^ shows that 
the level of adaption without sympathy From this simulation, each individual 
can increase its adaptation level with sympathy to other agent. 
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Adapt atioR Level of Individual 




(a) The adaptive level of individual 




Fig. 1. The change of adaptation level (with sympathy) 
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Adaptation Level of Groiqp 




Fig. 2. The change of adaptation level (without sympathy) 
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Fig. 3. The change of adaptation level: slow convergence 



(Case2 ) ai = 0.1 (slow to adjust the market price) and (3i = 0.1 (high 
learning speed) 

FigJUshows the case of the high speed of learning factor of sympathy, and in 
which the level of adaptation of each agent converges slowly. 

5 Conclusion 

The goal of the research is to understand the types of simples local interactions 
which produce complex and purposive group behaviors. We formulated and an- 
alyzed the social learning process of independent economic agents. We showed 
that cooperative behaviors can be realized through purposive local interactions 
based on each individual goal-seeking. Each economic agents does not need to 
express its adaptive function, nor to have a priory knowledge of those of others. 
Economic agent adapts its action both to the actions of other agents. 
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Abstract. The concept of an adaptation (fitness) landscape has been 
used to explain evolutionary processes. The landscape is a response sur- 
face for the genetic space defined by a genotype-environment system and 
evolution of populations through natural selection a search for higher 
peaks in this space. This is an appealing framework for other disciplines 
interested in issues of search and optimisation. One such application is 
the genetic improvement of traits in plant breeding. Here, breeding pro- 
grams can be viewed and analysed as search strategies that are used 
to explore the surface of an adaptation landscape to find higher adap- 
tive positions. The current theoretical framework considers genetic im- 
provement as a hill climbing process on a smooth single peaked adap- 
tation landscape. However, there is strong evidence to suggest that due 
to the effects of genotype- by-environment (GxE) interactions and epis- 
tasis, the landscapes encountered by plant breeders are in fact rugged 
and multi-peaked. Simulation methodology was used to compare two se- 
lection strategies currently used in plant breeding and investigate their 
capacity to confront the difficulties associated with the influences of GxE 
interaction and epistasis: (i) selection of genotypes based on performance 
in a single environment (mass selection), and (ii) selection of genotypes 
based on performance in several environments (multi-environment test- 
ing). A third selection strategy was proposed for genetic improvement 
on more complex adaptation landscapes. This selection strategy (shift- 
ing search strategy) was based on Wright’s ‘Shifting Balance Theory’, 
bf Keywords: fitness, adaptation, landscape, GxE interaction, epistasis, 
plant breeding 



1 Introduction 

Breeding strategies applied to the genetic improvement of plants in agriculture 
can be considered as search strategies seeking particular combinations of genes 
(genotypes) to improve traits of commercial significance. The majority of these 
traits are quantitative and under the control of many genes. For each gene there 
are alternative forms, referred to as alleles. These alleles combine to generate the 
different genotypes possible for a single gene at a locus on a chromosome. Com- 
bining this variation across genes rapidly generates large numbers of possible 
genotypes. Therefore, any search for a new genotype is a complex combinatorial 
problem where the numbers preclude evaluation of all possible genotypes. In 
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addition the expression of the genes is influenced by environmental conditions, 
which vary within a target population of environments (TPE). The genetic varia- 
tion within the gene pool available to a breeding program and the environmental 
variation within the TPE combine to generate a complex genotype-environment 
system. Improved genotypes for this system are sought by applying artificial 
selection strategies that aim to increase the frequency of genes contributing to 
enhanced performance of the traits. 

Any search for improved genotypes is further complicated by our lack of un- 
derstanding of the genetic control of these quantitative traits. Theoretical con- 
siderations, experimental investigations and experience from applied breeding 
programs indicate that the relative effectiveness of alternative breeding strate- 
gies will depend on the nature of this genetic control. For quantitative traits 
the importance of both genotype-by-environment (GxE) interactions and gene- 
by-gene interactions (epistasis) is of particular interest. GxE interactions occur 
when there is a change in the relative performance of genotypes when the geno- 
types are exposed to different environmental conditions. Epistasis occurs when 
the contributions to a trait by the genotypes of one gene are influenced by the 
genotypes of other genes. Both of these sources of interaction complicate the 
nature of the genotype-environment system and increase the degree of difficulty 
of the search for improved genotypes. 

R.A. Fisher and S. Wright debated similar issues in relation to evolution and 
natural selection [T|. Fisher proposed that a gene can be deemed favourable or 
unfavourable in terms of its average effect within a genotype-environment sys- 
tem. Therefore, adopting the Fisher model, breeding programs should operate to 
increase the frequency of the favourable genes. Wright proposed an alternative 
model where epistasis had a stronger influence on the value of genes. He sug- 
gested that the relative performance of genotypes can be viewed in terms of the 
concept of a landscape with multiple peaks. Therefore, with the Wright model it 
can be argued that breeding programs should operate to exploit local peaks by 
selecting specific desirable epistatic combinations of genes but at the same time 
maintain a capacity to search for new higher peaks. Fisher’s model, considered in 
terms of Wright’s landscape concept, is a simplification of the shape of the land- 
scape which assumes a single peak. If there was a single peak, breeding programs 
should operate to climb it as rapidly as possible without allocating resources to 
search for new peaks. Historically the Fisher model has dominated much of the 
thinking and principles used in the design of applied breeding strategies. 

The availability of powerful tools to investigate the genetic control of traits 
at the molecular level is contributing to increasing awareness of their complex- 
ity. With this awareness the issue of what is an appropriate genetic model for 
the design of breeding strategies is resurfacing. There is a growing body of ev- 
idence suggesting a greater importance of epistasis and GxE interaction than 
was previously thought. Improving our understanding of the relationship be- 
tween the structure of the underlying adaptation landscape and the effective- 
ness of alternative search strategies represented by breeding programs provides 
a basis for designing and implementing selection strategies that optimise re- 
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sponse to selection for different genotype-environment systems. We have used 
computer simulation to investigate the effectiveness of plant breeding strate- 
gies for genotype-environment systems that are influenced by both epistasis and 
GxE interactions [^. The objective of this paper is to examine the relative effi- 
ciency of plant breeding strategies that take into consideration the influence of 
GxE interactions and epistasis on the relative performance of genotypes for a 
quantitative trait. 



2 Materials and Methods 

A computer simulation experiment was conducted using the QU-GENE (Quan- 
titative GENEtics) simulation platform [5]. The QU-GENE simulation platform 
enables the design of E(N:K) models for genotype-environment systems; E is 
the number of different types of environments in the TPE, N is the number 
of genes and AT is a measure of the level of epistasis in the model. Using the 
E(N:K) notation identifies that different N:K genetic models are nested within 
the different types of environments encountered in the TPE, generating GxE in- 
teraction. The E(N:K) framework is a more general treatment of Kauffman’s 
NK model and incoporates both the influences of GxE interaction and epistasis. 
The E(N:K) model provides flexibility for investigating a wide range of genetic 
models ranging from smooth single peaked landscapes (no GxE interaction or 
epistasis: E = 1;AT = 0) to rugged multi-peaked landscapes (GxE interaction 
and epistasis: E > 1\K > 0). 

The architecture of the QU-GENE platform consists of two major compo- 
nents: (i) the engine that is used to define the genetic model (based on a diploid 
system) for the genotype-environment system, and (ii) the application modules 
that are used to investigate, analyse or manipulate populations of genotypes 
within the defined genotype-environment system [2]. The engine generates an 
E(N:K) adaptation landscape by defining the performance values of all possible 
genotypes in each type of environment. For this simulation experiment, the al- 
location of performance values to individual genotypes was based on the fitness 
definition used for the NK model by Kauffman |3]. Here, the fitness (perfor- 
mance) of a genotype in the TPE {W) was defined as: 

E N 
3=1 i=l 

where E is the number of environment types in the TPE, N is the number of 
genes, Cj is the frequency of occurrence of environment type j in the TPE and 
Wij is the fitness contribution of the fth gene in environment type j and is drawn 
(at random) from the uniform distribution between 0 and 1. For each epistatic 
combination, an independent fitness contribution {wij) was defined for locus i. 

An application module (LANDS) was developed to improve population fit- 
ness using three different recurrent selection strategies: (i) mass selection (MASS), 
(ii) multi-environment testing (MET) and (iii) shifting search strategy (SSS). 
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Fig. 1. Schematic outline of three selection strategies (a) mass selection (MASS) and 
multi-environment trials (MET), (b) shifting search strategy (SSS). 



Fig. la represents a schematic outline of the MASS and MET selection strate- 
gies. Using the individuals generated by the engine as an initial population, 
recurrent selection proceeds by evaluating the performance of the genotypes, 
identifying a select group of individuals and randomly intermating the select 
group to generate a population for the next cycle. For MASS selection, the eval- 
uation of genotypes was based on performance in a single environment type sam- 
pled at random from the TPE. For MET selection, the evaluation of genotypes 
was based on performance in multiple environment types sampled at random 
from the TPE. The third selection strategy (SSS) was based on Wright’s [4] 
‘Shifting Balance Theory’ (Fig. lb). As with the MET selection strategy, the 
population of genotypes was evaluated in multiple environment types, a select 
group of genotypes identified and randomly intermated. However, these cycles of 
population improvement were interspersed with phases of subdivision. Here, t^ 
population was divided into smaller sub-populations where evaluation, selection 
and intermating were independently conducted. After a number of cycles these 
sub-populations were combined into a single population where the process was 
continued as for the MET strategy. 

Using the LANDS module, the performance of three selection strategies 
(MASS, MET, SSS) were evaluated for E(N:K) landscapes with increasing com- 
plexity. Genotype-environment systems based on 20 genes {N = 20), eight levels 
oi K {K = 0, !,...,?) and four levels oi E {E = 1,2,5,10) were considered, 
resulting in 32 different E(N:K) models. For each model, 250 independent runs 
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(10 different starting populations; 25 runs of each) were conducted for each se- 
lection strategy. Each run was conducted for 50 cycles using a population of 
500 genotypes (starting populations constructed to have a fitness level of 0.5) 
with the top 20% selected at each cycle. During the subdivision phases of the 
SSS, the population was divided into 20 sub-populations (25 genotypes in each). 
The subdivision phases were conducted in blocks of five cycles interspersed with 
blocks of ten cycles of single population improvement (Fig. lb). For the MASS 
selection strategy, genotypes were evaluated in a single environment type. For 
the MET and SSS, genotypes were evaluated across ten environments sampled 
at random from the TPE. The performance of the three selection strategies for 
each of the E(N:K) models was evaluated as the average fitness of the population 
for the 250 runs over the 50 cycles. 

3 Results 

The simulation experiment indicated there were significant interactions between 
the relative efficiency of the three selection strategies and two major parameters 
of the E(N:K) model, if (GxE interaction) and K (epistasis). For smooth single 
peaked landscapes (if = 1; AT = 0), all three selection strategies achieved similar 
response to selection. However, with the introduction of GxE interaction and 
epistasis into the E(N:K) framework, alternative response profiles were observed 
for the three strategies. 

Fig. 2 displays the average performance of the three selection strategies 
for four levels of GxE interaction (if = 1,2,5,10) and no epistasis {K = 0). 
As the level of GxE interaction introduced into the system was increased, the 
population fitness achieved by the three selection strategies decreased (Fig. 2a- 
d). The relative efficiency of the three selection strategies within levels of E was 
not constant. For the model containing no GxE interaction, E(N:K)=1(20:0), 
the three selection strategies achieved a similar response (Fig. 2a). However, as 
GxE interaction was introduced (Fig. 2b~d), the two selection strategies using 
multi-environment trials (MET and SSS) were more efficient than the single 
environment trial (MASS). The level of relative improvement of the MET and 
SSS strategies increased as the level of E was raised. 

Fig. 3 displays the average performance of the selection strategies for four 
levels of epistasis {K = 1,3, 5, 7) and no GxE interaction (if = 1). Here, the 
MET selection strategy offered no improvement over the MASS selection strat- 
egy. However, the strategy based on phases of subdivision (SSS) achieved a higher 
level of fitness. Furthermore, the level of relative improvement increased with the 
amount of epistasis (Fig. 3a-d). Unlike the smooth curvilinear average fitness 
response of the MASS and MET selection strategies, the SSS displayed phases 
of rapid improvement interspersed with cycles of sharp decrease in fitness. The 
fitness profile of the SSS can be attributed to the different components of the 
shifting search. Here, the independent searches conducted during the subdivision 
phases of the SSS enabled smaller sub-populations to explore different regions of 
the adaptation landscape. Due to the large amount of genetic variability among 
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Fig. 2. Response to selection of the MASS, MET and SSS strategies for four levels of 
E and K — (a) E = 1, (b) E = 2, (c) E = 5 and (d) E = 10. 




0 10 20 30 40 50 0 10 20 30 40 50 

Cycle Cycle 




Fig. 3. Response to selection of the MASS, MET and SSS strategies for four levels of 
K and E^ 1: (a) K = 1, (h) K ^ 3, (c) K = 5 and (d) K ^7. 



sub-populations, the combination of these independent searches into a single 
population initially reduced population fitness. However, the exposure to alter- 
native regions of the landscape provided an opportunity for the SSS to increase 
performance relative to the MASS and MET strategies. 
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Fig. 4. Response to selection of the MASS, MET and SSS strategies at cycle 50 for 
four levels of E and five levels oi K: {a) E = 1, (b) E = 2, (c) E = 5 and (d) E = 10. 



Fig. 4 displays the average performance of the selection strategies at cycle 
50 for combinations of both GxE interaction and epistasis. The three selection 
strategies achieved an improved fitness level, relative to the starting population 
(0.5), for all of the E(N:K) models considered. However, the introduction and 
increase in levels of GxE interaction and epistasis influenced the relative effec- 
tiveness of the selection strategies. For models containing no GxE interaction 
(Fig. 4a), the MASS and MET strategies achieved similar levels of response for 
all levels of K. However, the efficiency of the MET strategy increased relative 
to MASS with the introduction of GxE interaction into the system (Fig. 4a- 
d). For models containing no epistasis (Fig. 4a-d; K = Q), the MET and SSS 
strategies achieved similar levels of response for all levels of E. However, the 
efficiency of the SSS strategy increased relative to MET with the introduction 
of epistasis into the system (Fig. 4a-d). 



4 Discussion 

As the complexities of the genome are exposed there is an increasing aware- 
ness that many of the simplifying assumptions used to mathematically model 
genotype-environment systems are difficult to sustain. The E(N:K) model pro- 
vides a framework for relaxing many of these assumptions, in particular those 
related to epistasis and GxE interactions. Plant breeding strategies apply artifi- 
cial selection to finite samples of genotypes and seek to identify those genotypes 
with higher performance values for a given genotype-environment system. As 
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the levels of epistasis and GxE interaction increase, the shape of the adapta- 
tion landscape on which this search takes place becomes more complex. We are 
investigating the relative merits of alternative breeding strategies in terms of 
their ability to search for improved genotypes on these landscapes. An impor- 
tant finding is that directional selection is an extremely powerful search strategy 
across a wide range of levels of complexity in the genotype-environment systems 
and their associated adaptation landscapes. For example, in the present study 
mass selection was capable of making improvement for all of the E(N:K) mod- 
els considered, albeit gradual for the most complex. However, it is clear that 
selection strategies which take into account important features of the shape of 
the adaptation landscape can improve the effectiveness of the search. The use 
of multi-environment trials (MET and SSS) was more effective than the single 
environment trial (MASS) as GxE interaction was introduced to the system. 
This is a common strategy used in applied plant breeding to deal with this 
source of complexity. The use of the shifting search strategy (SSS) introduced 
a further improvement to the search over MET as the level of epistasis in the 
system increased. The role of this sort of search strategy has been discussed in 
relation to evolution in natural systems but only speculated on in agricultural 
systems. However, at the global level the flow of genetic resources from small 
local breeding programs to the larger breeding programs of the international 
centres and the subsequent flow of new germplasm back to the local programs 
may be viewed as a form of shifting search strategy. This suggestion and the re- 
sults of our simulation studies identifies avenues for investigation of the efficient 
use of genetic resources in plant breeding. 

While to date we have confined our investigations to random genetic net- 
works, the resulting genotype-environment systems show many emergent prop- 
erties that are observed in practice when plant breeding programs search real 
biophysical systems. These parallels, and the limited ability of our current quan- 
titative genetic theory to predict significant genetic improvements for quantita- 
tive traits, provide much food for thought in relation to the role of plant breeding 
in the quest for food security and sustainable agricultural systems. 
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Abstract. Equations for calculating the control force of a movable in- 
verted pendulum are generated directly with Genetic Programming (GP). 
The task of a movable inverted pendulum is to control the force given a 
cart on which a pole is hinged, not only to keep a pole standing but also 
to move it to an arbitrary target position. 

As the results of experiments, intelligent control equations are obtained 
that can lean the pole toward a target position by pulling the cart in the 
opposite direction, and then move the cart to the target while keeping 
the pole standing inversely. They also have the robustness to move the 
cart with the pole standing to the new target position when the target 
is changed, even if the cart is moving to the old target position. 

The robustness of the problem is experimentally defined and the appro- 
priate value of the parsimony factor in GP is identified to obtain control 
equations with robustness and simplicity as the solutions. 



1 Introduction 

The pole-balancing problem has been attacked many times previously with meth- 
ods such as evolutionary fuzzy logics [T] and evolutionary neural networks m- 
However, approaches to this problem with Genetic Programming (GP) have 
hardly been found except in the eleventh chapter of the book “Genetic Pro- 
gramming” by John Koza jj]. His approach is to evolve equations to determine 
the direction of the bang-bang force given a cart. 

However, the objective of this paper is to evolve an equation with GP that 
can calculate the magnitude of the driving force of a cart so that it allows the cart 
to move to given target positions while keeping a pole standing on the cart. It is 
in contrast to a general pole-balancing problem whose objective is only to keep 
a pole standing. Moreover, a robustness of a control equation is experimentally 
defined and evaluated to realize a robust control that is able to respond to the 
changes of a target position while moving to an old target. 

X. Yao et al. (Eds.): SEAL’98, LNCS 1585, pp. 179- tl8^ 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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2 Model of Inverted Pendulum 

The model of the pole-balancing problem in this study is simulated in two- 
dimensional space, and no friction of the hinge or sliding of the cart is assumed. 
The equations of motion given by Anderson are simulated at discrete times. 
The velocity of the cart and the angular velocity of the pole at time t+1 are cal- 
culated with the Runge-Kutta approximation method. The position of the cart 
and the angle of the pole are calculated with the Euler approximation method. 
For these simulations, the constants are the time step {At = 0.02 seconds), the 
mass of the cart {nic = 1.0 kg), the mass of the pole (rup = 0.1 kg), the pole 
length {I = 1.0 m) and gravity {g = 9.8 m/s^). 

3 Applying GP to the Inverted Pendulum 

3.1 Function Set and Terminal Set 

In this study, the force to control the cart is directly expressed as an equation 
defined by a tree of S-expression with a function set and a terminal set in GP. 
For this problem, the function set and the terminal set are prepared as follows. 

r = {e, e, x, d, i.o, lo.o, -i.o}, 

where % is the modified division defined by Koza [^. The parameters of the 
pole-cart system are 9, 9 and i, which are the angular velocity, the pole angle, 
the cart velocity, respectively. Moreover, d is the difference between the cart 
and the target positions. The function set is the most elemental set of the four 
arithmetic functions. This is determined by preparatory experiments with larger 
function sets that additionally include the absolute, square root, exponential and 
sine functions. In these experiments, the complex functions are rarely used for 
successful solutions. 

The driving force of the cart is calculated at each time step from the status 
of the pole-cart system by the equation tree consisted of the elements of the 
function set and the terminal set. 



3.2 Fitness Function 

The purpose of this study is to search for a control equation which will move the 
cart to the given target positions while keeping the pole standing. The target 
position is a variable in the control process. However, it is difficult to define 
the fitness function that evaluates the control performance in such a dynamic 
environment with variable target positions. 

Therefore, the fitness function is defined as the control problem of moving 
the cart to a fixed target position while keeping the pole standing. It is defined 
as a minimum search problem as follows: 
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fitness = (wi |6*(i)| + C02 |2;(t) — T\) 
t = 1 
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t^to 

( f + 9'^{ts) + x'^its)) < e 

s = < and \x{tg) —T\ < S), 

[ STEP (otherwise), 



( 1 ) 

(2) 



where T is the target position, e is the small constant to decide the stationary 
state of the pole, and S is the allowance for the error between the cart and the 
target positions. In addition, u>i, W 2 , L03 and oja are weight constants given in 
Table [TJ STEP shows the maximum number of simulation steps, which means 
that the maximum simulation period is STEP x 0.02 seconds. Moreover, to is 
the time of the first instant when the condition |0(to)| ^ 9 max or |a;(to)| > Xmax 
is satisfied. The constants are set at 9max = 45.0 degrees, Xmax = 15.0 m, £ = 
10-® and 6 = 10-3. 

Table 1. Weights for the fitness function 



Weight 


U>1 


LO2 


W 3 


LU4 


Value 


0.1 


5.0 


3.0 


10.0 



4 Empirical Study 

4.1 Empirical Procedure 

In this empirical study, the SGPC program developed by Walter Aldern Tackett 
and Aviram Carmi is used for GP simulations. The main parameters of GP are 
set at population = 2,100, maximum generation = 100 and parsimony factor 
= 0.0, 10.0, 30.0, 50.0, 80.0, 100.0, 150.0 and 200.0. The paper of Kinner jB] is 
referred to for the parsimony factor. The reason why the parsimony factor is used 
is that it works for reducing the number of nodes in a tree in an evolutionary 
process. It is also because the generalization of an equation tree for the problem 
may be expected. The tree fitness which includes the parsimony factor is defined 
byEq.@. 

tree- fitness = fitness + parsimony factor x No. of nodes in a tree. (3) 
In the evaluation, the initial states and the target position are as follows. 

0 ( 0 ) = 0 . 0 , 

0(0) = {2 0^„3, +Rnd(-1.0, 1.0)} mod {2 9max) ~ Omax, 
i(0) = 0.0, 

x(0) = sign(Rnd(— 1.0, 1.0)) x 10.0, 

T = 0.0. 

A simulation is done for STEP = 2000, i.e. 40 seconds. 
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4.2 Empirical Results 

The first objective of the experiments is to investigate the robustness of the con- 
trol equations obtained by evolutions of GP. It is important to examine whether 
they work in a wide variety of situations. In this study, the robustness of a 
control equation is regarded as how successfully it controls the driving force to 
move the cart to a given target position while keeping the pole standing for the 
initial states of various angular velocities of the pole and various velocities of the 
cart. The reason is that the pole-cart system has various angular velocities and 
various cart velocities at the instant the target position is changed on the way 
of moving to the old target position. 

The robustness of a control equation is defined as the success rate in 1,000 
simulations of the pole-cart system. Success is defined as controlling the driving 
force that moves the cart to the fixed target position while keeping the pole 
standing for the random initial values of the four parameters. The initial values 
of the parameters have normal distributions, with the variances given in Table[^ 



Table 2. Variances of initial values 



Parameters 


e 


e 


X 


X 


Variance 


2.0 


12.0 


2.5 


4.0 



TableElshows the results of the evolutionary experiments and the robust tests 
for each value of the parsimony factors for 20 experiments of GP evolutions. In 
this table, the “hit rate” means the rate at which the control equations which 
satisfy the condition part of Eq. 0 in the simulation process are obtained as 
solutions of GP evolutions in 20 experiments. The “depth” and “nodes” are the 
average depth and the average number of nodes of the trees obtained as the best 
solutions in 20 experiments. 



Table 3. Results of robust tests for 20 experiments 



parsimony factor 


hit rate 


robustness 


depth 


nodes 


0.0 


0.75 


0.588 


16.20 


181.6 


10.0 


0.90 


0.770 


12.00 


70.8 


30.0 


0.70 


0.615 


9.35 


42.5 


50.0 


0.75 


0.685 


9.45 


41.0 


80.0 


0.70 


0.626 


8.80 


37.7 


100.0 


0.90 


0.824 


9.15 


31.5 


150.0 


0.75 


0.653 


7.85 


27.4 


200.0 


0.65 


0.618 


5.20 


18.6 
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Fig. 1. The effect of parsimony factors on the robustness 



Fig.H] shows the effect of the parsimony factors on the average (indicated by 
a bold line) and the best (indicated by a thin line) robustness over 20 exper- 
iments. In this figure, there is the highest peak when the parsimony factor is 
100, although the variations of the best robustness are small. The best global 
robustness in all experiments is also obtained at the parsimony factor of 100. 
This means that if the parsimony factor were properly set around 100, solutions 
with higher robustness would be obtained with the highest probability. That is, 
it is found that there exists an optimal value of the parsimony factor for the 
evolution of robustness. 

Fig.[2] also shows the effect of parsimony factors on the average number of 
nodes and the depth of trees for 20 experiments. The number of nodes and the 
depth of equation trees with the best robustness are also shown for 20 experi- 
ments by the thinner line in Fig.[2l In this figure, the average number of nodes 
and the average depth decrease monotonically, but the depth of equation trees 
with the best robustness have a peak at the parsimony factor of 100. From this. 




Fig. 2. The effect of parsimony factors on the number of nodes and the depth 
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it is estimated that the equation tree with higher robustness needs fewer nodes 
and more depth. 

The following Eqs. (§)> (0 and are simplified control equations 

from equation trees with the higher robustness that are obtained in each of 20 
evolutionary experiments that has parsimony factors of 10, 30, 80, 100 and 200, 
respectively. 

forcei{9, 6», i, d) = 30 0 + 101 6 » + 12 a; + 6 d - 10 0 0 (0 6 » - 10 6 » + 1), (4) 



force2{0, 9, i, d) = 54 d + 118 d + 54 i + 5 d, (5) 

forces{9, d, i, d) = 33 d + 99 d + 10 i; + 4 d + 2 d a; d, (6) 

force^{9, d, i, d) = 40 d + 88 d + 8 a; + 3 d, (7) 

force5{9,9,x,d) = 19 d + 42 d + 3 i + d - d (d + 20 d). (8) 



The depths and the numbers of nodes of equation trees from which the above 
equations are simplified are shown in Table 

Table 4. Depths and numbers of nodes of equation trees 



Equation No. 


(0) 






m 


0 


depth 


10 


10 


7 


14 


6 


nodes 


81 


57 


33 


45 


31 



Eqs. ® and 0 have the form of a very simple linear combination of param- 
eters: the angular velocity, the pole angle, the cart velocity and the difference 
between the cart and the target positions. They also form the equations of P-D 
control. Eqs. ©, ® and © are also simple, although they include quadratic, 
cubic and biquadratic terms. 

The control processes by control Eqs. © and (|7| are shown in Fig.|3]and|l] 
respectively. The simulations are executed with two initial values of the cart 
position for Eq. 

(i) x(0) = 10.0, (ii) a;(0) = -10.0. 

Other parameters are set at 0.0 and the target position is also 0.0. 




Fig. 3. Transitions of distance, angle and force for Eq. 
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Fig. 4. Transitions of distance, angle and force for Eq. 0 



Because the control equation o is an odd function for all parameters of the 
pole-cart system, simulations are executed for two positive initial values of the 
cart position shown as follows. 

(hi) a;(0) = 10.0, (iv) a;(0) = 5.0. 

Other parameters are set at 0.0 and the target position is also 0.0. 

The time elapsed until stationary state are 9.86 seconds in case (i) and 9.52 
seconds in case (ii) for Eq. and 13.88 seconds in case (hi) and 12.92 seconds 
in case (iv) for Eq. ©. 

Fig.[3] and show that the equation controlled the driving force so that if 
the target position is changed to another position when the pendulum is in the 
stationary state, the cart moves to the target position after leaning the pole 
toward the target direction by pulling the cart in the opposite direction. 

This control pattern is executed even when the pendulum is not in the sta- 
tionary state, that is, the state moving toward the old target position. This 
control pattern easily comes to human minds through experiences and learning. 
However, it has been very difficult for conventional artificial intelligence to ob- 
tain an intelligent control pattern without human help. This is one evidence that 
it is possible to obtain high intelligence by evolutionary processes. Moreover, it 
is interesting to note that high intelligence is obtained with a very simple linear 
equation. 

5 Conclusions 

In this paper, the evolutions with GP are tried to obtain equations for the 
control of the movable inverted pendulum. In the experiments, robust solutions 
of equations are obtained that control driving force to move a cart to a given 
target position while keeping a pole standing. 

The robustness for this problem is defined quantitatively and the effectiveness 
of the parsimony factor for the evolutions of robustness is investigated. As a 
result, it is found that there exists an optimal value of the parsimony factor for 
the evolution of robustness. It is also found that solutions of very simple control 
equations have the intelligence to move the cart to the target position after 
leaning the pole toward the target direction by pulling the cart in the opposite 
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direction. This study contributes to the evidence that it is possible to obtain 
high intelligence by evolutionary processes. 

Future works are to evolve control equations for the control of a movable 
inverted pendulum in an environment that contains factors such as the friction 
of the hinge and cart sliding, and various noises. In addition, future studies will 
focus on more difficult control problems and investigate deeply how robustness 
and generalization can be obtained in the evolutionary process. 
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Abstract. This paper deals with the problem of nurse rostering in Bel- 
gian hospitals. This is a highly constrained real world problem that was 
(until the results of this research were applied) tackled manually. The 
problem basically concerns the assignment of duties to a set of people 
with different qualihcations, work regulations and preferences. 
Constraint programming and linear programming techniques can pro- 
duce feasible solutions for this problem. However, the reality in Belgian 
hospitals forced us to use heuristics to deal with the over constrained 
schedules. An important reason for this decision is the calculation time, 
which the users prefer to reduce. The algorithms presented in this pa- 
per are a commercial nurse rostering product developed for the Belgian 
hospital market, entitled Plane. 

Keywords Nurse rostering, personnel scheduling, tabu search 



1 Introduction 

In this paper we will discuss the algorithms that have been developed for the 
commercial nurse rostering system (Plane) . The development of Plane was based 
on an extensive market research in 1993. One of the conclusions was that the re- 
quirements of Belgian hospitals cannot be met with a cyclic ’three shift’ schedule. 
Recent research done by the Stichting Technologie Vlaanderen [Zj also showed 
that, instead of a cyclic schedule, the nurses prefer an ’ad-hoc schedule’ in which 
they can express their personal wishes and priorities. Because of the size of the 
solution space (the scheduling period is usually one month and the number of 
possible duties per day varies from 6 to 15), the nurse rostering problem tackled 
by Plane differs a lot from other rostering problems described in the literature. 
The planning period in 03 ! is restricted to 1 week and in mm there are only 
three different duties to be planned. 

Plane can decide (per nurse) which duties can or cannot be performed (accord- 
ing to that nurse’s qualification category) when there is not enough personnel 
available. 

Another goal of Plane is the freedom for the user to define a personal cost 
function modifying predefined constraints, modifying weight parameters, . . . The 

X. Yao et al. (Eds.): SEAL’98, LNCS 1585, pp. 187 fm] 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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solution method has to be robust enough to cope with widely varying cost func- 
tions. 

In 1^ a constraint programming solution for the nurse rostering problem is pre- 
sented. Preliminary experiments with Oz showed that it is very hard to calculate 
monthly schedules that take into account the high number of ’consecutiveness’ 
constraints that Belgian hospitals have to deal with. Also in the mathematical 
approaches of 1418 1 . the number of different constraints is much lower than in 
our problem. 

A heuristic method, combining tabu search and algorithms based on manual 
scheduling techniques proved to be very appropriate for this combinatorial prob- 
lem in which the calculation speed is as important as the attempt to find a 
solution that is close to optimal. 

2 Plane, nurse rostering software for Belgian hospitals 

Plane is a scheduling system developed by ImpakiQ and GE10 to assist the 
scheduling of personnel in hospitals for which the demands for every qualifica- 
tion can be determined over a fixed period in time and which have to fulfil a 
number of constraints, limiting their assignments. 

A description of Plane, its problem domain, its system specific and functional 
requirements can be found in [J . The first version of Plane was first implemented 
in a hospital in 1995 but the system is still evolving to cope with the new and 
more complicated real world problems that keep appearing. So far, several hos- 
pitals in Belgium have replaced the very time consuming manual scheduling by 
this system. 

The cost function used in the algorithms is modular and can deal with all con- 
straints matching the types described section HO] 

3 Problem description 

In general, a ward consists of about 20 people, having different qualifications 
and responsibilities. These people are placed into categories based upon their 
qualifications and job description such as head nurse, regular nurse, nurse aid, 
student,. . . Some of the nurses can replace people from another category (de- 
pending upon their qualifications) . Each replacement by a person from another 
category will raise the evaluation function by an amount the user can set. 

3.1 Hard constraints 

The personnel requirements are expressed in terms of a required number of nurses 
of every category for every duty during the planning period, which is often one 
month. These requirements are the only hard constraints in the problem. Op- 
tionally, the user can choose to plan the minimum number of required personnel 

^ Impakt N.V., Ham 64, B-9000 Gent 

^ GET, General Engineering & Technologie, Antwerpse Steenweg 107, B-2390 Oost- 
malle 
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Fig. 1. Diagram of the hybrid tabu search algorithms for the nurse rostering problem 



or the preferable number of personnel. A third option is to plan at least the min- 
imal required number of nurses and to add nurses whenever it doesn’t increase 
the evaluation function (’add duties towards preferred personnel requirements’ 
Fig. [T|). 



3.2 Soft constraints 

It is highly exceptional in real world problems to find a schedule that satisfies all 
the soft constraints, but the aim of the algorithms is to minimise the violations of 
these constraints. The constraints are all to be specified by the users of the sys- 
tem. Certain general constraints are recommended by hospital regulations (but 
in certain situations, may need to be ignored). There are other soft constraints 
that are normally created by an agreement between the head nurse (or personnel 
manager) and the individual nurses. At this moment there are about 30 (modifi- 
able) constraints. It is usually the case that not all constraints can be satisfied at 
the same time. When a contradiction between constraints occurs, the personal 
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preferences of staff (such as requests for holidays, requests to work a certain duty 
on a certain day) are stronger than any other constraint. A detailed list of the 
constraints in Plane can be found at http://www.impakt.be/plane/indexf.htm. 

4 Tabu search algorithm and variants 

The entire flow diagram of the hybrid tabu search algorithms described in this 
section can be seen in Fig.[Tl 

4.1 Feasible initial schedule 

The first part of the scheduling algorithm is the construction of a feasible initial 
solution. For practical planning problems three possible strategies are used: 
Current schedule: This is especially useful when urgent changes in the schedule 
are required. In real life this may happen when a scheduled nurse is suddenly ill 
and has to be replaced and, of course, we do not want to drastically change the 
schedule for the other people. 

Schedule of the previous planning period: this option is useful when the schedule 
in the previous planning period was of very high quality and the constraints on 
the current and the previous planning period are similar. 

Random initialisation: This is the simplest initialisation, it starts from an empty 
schedule. 

After this initialisation, the schedule has to be made feasible. This is carried 
out by randomly adding and/or removing duties for every category until the 
requirements are met. 

Although the two first initial schedule constructors may seem very attractive, 
our experiments show that it is not too difficult for the tabu search algorithm to 
produce schedules of comparable quality starting from a random initial schedule. 
Indeed it is often the case that with the two latter initialisations, the algorithm 
is in a local minimum already and has problems escaping from it. 

4.2 Original tabu search algorithm 

In the simplest tabu search algorithm, the only move we consider is a move of 
a duty from one person to another on the same day. The move is not allowed if 
the goal person is not of the right category or is already assigned to that duty. 
This will not affect the hard constraints. 

For each category (for each iteration) possible moves will be calculated and the 
move leading to the highest benefit will be performed. If the highest benefit is 
negative, the move will be performed anyway, unless this move is forbidden by 
the tabu list. When a move is accepted, an area in the roster around the roster 
point where the duty comes from and where it is moved to may not be changed. 
For comparison purposes only, we introduced a steepest descent algorithm in 
which the neighbourhood of the moves is exactly the same as in the tabu search 
algorithm. After evaluating all the possible moves in the neighbourhood, the 
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best one will be performed, unless this best move does not improve the schedule, 
in which case the algorithm stops. These algorithms turned out not to be pow- 
erful enough to produce good solutions for complex problems as is shown in the 
’steepest descent’ and ’tabu search’ experiments in Table D]&I2 (section EJ. The 
tabu search algorithm performs better than the steepest descent algorithm and 
is therefore used as a local search heuristic in the hybrid algorithms described 
in section 



4.3 Some Heuristics for the problem 

Here we describe some heuristics that can be employed (in conjunction with the 
tabu search algorithm) to improve the solution. 



Diversification 1: Complete weekend Although the users of the program 
can assign a cost parameter to this constraint it is very hard to find satisfactory 
solutions. The problem is that there are so many constraints and the degree of 
freedom of the problem is so high that it is likely to find solutions satisfying many 
other constraints but not this one. In the graphical user interface, incomplete 
weekends really catch the eye, while other constraints such as overtime or too 
many morning shifts on Mondays,. . . are not immediately visible. Because it 
is almost impossible to guarantee good solutions with a certain setting of the 
parameters, we decided to solve this problem the hard way, by not caring about 
possible problems for other constraints. 



Diversification 2: Consider the worst personal schedule If the complete 
weekend function (above) has not changed the schedule it can be beneficial to 
look at the people with the worst schedule (according to the evaluation function). 
For every person (within the category being scheduled) it is possible to calculate 
the value of the evaluation function after exchanging a part of the schedule of 
the people involved. The parts of the schedule always contain full days and the 
maximum length is half the planning period. After all possibilities have been 
calculated, which is quite time consuming, the best exchange (chosen at random 
from equal values) is performed. The result of this process often results in a 
better solution. 



Greedy shuffling: Model human scheduling techniques There was a prob- 
lem with the results of the tabu search algorithm because sometimes a human 
could improve the visual result by making a small change. This process cal- 
culates all possible ’Diversification 2’ (above) moves for every pair of people. 
After listing the gain in the cost function for every possible exchange, the shuffle 
leading to the best improvement will be performed. Afterwards, the next best 
improvement in the list is performed, provided none of the people involved were 
already involved in an earlier shuffle. As long there are improving exchanges in 
the list, they are carried out. The whole procedure starts over again until none 
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of the possible exchanges improves the cost function. 

The improvements on the schedule that can be obtained by employing this pro- 
cedure and tabu search (described below) are considerable but the biggest ad- 
vantage of this step is that it creates schedules for which it is almost impossible 
for a human to improve the schedule. 



4.4 Hybrid tabu search algorithms 

After extensive testing of hybrid versions of the tabu search algorithm and the 
above heuristics 2 algorithms were developed. The first one produces schedules 
when a very short calculation time is required (as it often is). The second al- 
gorithm needs more calculation time but generates schedules of a considerably 
higher quality. 



Tabu search -|- diversification: TSl The aim of this algorithm is to provide 
reliable solutions in a very short time. In practice this algorithm has proved to 
be very useful to check whether the constraints are realistic, whether during the 
holiday periods it will be possible to plan good schedules if every person gets 
their desired holiday period etc. . . 

The algorithm is constructed quite simply from the original tabu search algo- 
rithm. If after a number of iterations no improvement is found, the weekend 
step is performed. If the weekend step does not result in a different schedule 
the second diversification step is performed. After this diversification step, the 
original tabu search algorithm is used again and so on. The calculations stop 
after a number of iterations without improvement. 



Tabu search -|- greedy shuffling: TS2 This requires more time but the 
results are considerably better from the human point of view. Anecdotal evidence 
suggests that the level of satisfaction with schedules produced by this algorithm 
is actually higher than the cost function indicates. The main reason for this is 
that after the shuffling step the users cannot easily improve the results. 

It is important to do the greedy shuffling step at the end of the calculations 
because its real aim is to perform the exchanges a human user would perform. It 
is because of the exhaustive search character of the shuffling that this step takes 
a lot of time. It is very important to calculate this step until there are no further 
improvements because otherwise the goal of excluding manual improvements to 
the schedule might be lost (Greedy shuffling in section IT^ . 

5 Test results 

The tests in this paper are restricted to planning the minimal requirements 
(R-min) , planning between the minimal and the preferred requirements (R-min- 
pref), and planning according to the calculated demands (R-calc) as explained 
in section lOI (Hard constraints). For the latter we decided to do the step ’add 
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Problem 1 


R-min 


R-min-pref 


R-calc 




Value 


Time 


Value 


Time 


Value 


Time 


steepest descent 


2594 


1’26” 


2395 


P37” 


2657 


1’36” 


tabu search 


2435 


2’05” 


2214 


2 ’06” 


1928 


1’59” 


ts stop crit. x50 


1915 


40’58” 


1675 


41’21” 


1534 


23’58” 


tsl 


1341 


6’00” 


1089 


5’59” 


929 


5’27” 


ts2 


1264 


20’ 15” 


1011 


24’39” 


809 


28’08” 



Table 1. Value of the evaluation function and results of the steepest descent and 
variants of the hybrid tabu search algorithm for Problem 1, planning order of the 
qualifications as chosen by the customer 



Problem 2 


R-min 


R-min-pref 


R-calc 




Value 


Time 


Value 


Time 


Value 


Time 


steepest descent 


1338 


44” 


1338 


45” 


1134 


47” 


tabu search 


1189 


57” 


1189 


00 

lO 


933 


1’03” 


tsl 


843 


3’ 18” 


843 


3’18” 


867 


2’14” 


ts2 


809 


6’25” 


809 


6’25” 


588 


10’19” 



Table 2. Value of the evaluation function and results of the steepest descent and 
variants of the hybrid tabu search algorithm for Problem 2, planning order of the 
qualifications as chosen by the customer 



duties towards preferred personnel requirements’ whenever this does not cause 
a violation of the soft constraints (section 13.211 . 

In Table [Hand [21 the results of the variants of the tabu search algorithm are 
compared to the steepest descent algorithm. The test examples Problem 1 and 
Problem 2 are hard to solve real world problems and in both cases the personal 
demands make a good schedule almost impossible. 

The column ’value’ shows the value of the evaluation function (cost parameter 
per constraint times the extent the constraint is violated). The column ’Time’ 
contains the calculation times on an IBM Power PC RS6000. 

The third set of results, where the demands are adapted to the constraints as 
described in section EU (calculating more realistic demands), are better than the 
results in the first column. In Problem 2, there was no difference between the 
minimal and the required demands. 

For all the considered examples, the tabu search algorithm performs better than 
the steepest descent algorithm. We decided to organise the stop criterion for 
the tabu search algorithm so that the calculation time is of the same order of 
magnitude as the time required to do steepest descent. Only Table 1 contains 
the results of the original tabu search algorithm for a longer calculation time. 
The behaviour of the hybrid algorithms is better than the behaviour of the 
normal tabu search algorithm (with a short calculation time) . Even considering 
the calculation time, for use in practice it is worth using the hybrid tabu search 
algorithm because the degree of confidence the users have in the program is 
much higher. 
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6 Conclusion 

By automating the nurse rostering problem, the scheduling effort and calculation 
time are reduced considerably. The evaluation of schedules is very quick for 
all possible combinations of constraints and the quality of the automatically 
produced schedules is much higher than the quality of the manual schedules. 
The users of Plane often place an emphasis on the higher quality of the solution 
because the system provides an objective schedule in which all nurses are treated 
equally and in which the number of violated constraints is very low. Combining 
the simple tabu search algorithm with some specific problem solving heuristics 
not only guarantees better quality rosters but also satisfies the users of the system 
to a very high extent because it is almost impossible for experienced planners to 
improve the results (considering the constraints) manually. For many practical 
scheduling problems the higher quality of the solutions produced by the hybrid 
algorithm compared to the simple tabu search algorithm compensates for the 
increase in calculation time. 
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Reinforcement learning (RL) concerns the problem of a learning agent inter- 
acting with its environment to achieve a goal. Instead of being given examples of 
desired behavior, the learning agent must discover by trial and error how to be- 
have in order to get the most reward. RL has become popular as an approach to 
artificial intelligence because of its simple algorithms and mathematical founda- 
tions (Watkins, 1989; Sutton, 1988; Bertsekas and Tsitsiklis, 1996) and because 
of a string of strikingly successful applications (e.g., Tesauro, 1995; Crites and 
Barto, 1996; Zhang and Dietterich, 1996; Nie and Haykin, 1996; Singh and Bert- 
sekas, 1997; Baxter, Tridgell, and Weaver, 1998). An overall introduction to the 
field is provided by a recent textbook (Sutton and Barto, 1998). Here we summa- 
rize three stages in the development of the field, which we coarsely characterize 
as the past, present, and future of reinforcement learning. 

RL past, up until about 1985, developed the general idea of trial-and-error 
learning — of actively exploring to discover what to do in order to get reward. 
It was many years before trial-and-error learning was recognized as a significant 
subject for study different from supervised learning and pattern recognition. 
RL past emphasized the need for an active, exploring agent, as in the studies 
of learning automata and of the n-armed bandit problem. Another key insight 
of RL past was just the idea of a scalar reward signal as a simple but general 
specification of the goal of an intelligent agent, an idea which I like to highlight by 
referring to it as the reward hypothesis . The learning methods of RL past usually 
learned only a poliey, a mapping from perceived states of the world to the action 
to take. This limited them to relatively benign problems in which reward was 
immediate and indicated (e.g., by its sign) whether the behavior was good or 
bad. Problems with delayed reward, or in which the best action much be picked 
out of several good actions (or the least bad out of several bad actions), could 
not be reliably solved until the ideas of value functions and temporal-difference 
learning were introduced in the 1980s. 

The transition to RL present (?s 1985) came about by focusing on value fune- 
tions and on a general mathematical characterization of the RL problem known 
as Markov deeision proeesses (MDPs). The state-value function, for example, is 
the function mapping perceived states of the world to the expected total future 
reward starting from that state. Almost all sound methods for solving MDPs 
(that is, for finding optimal behavior) are based on learning or computing ap- 
proximations to value functions, and the most efficient methods for doing this all 

* The slides used in the talk corresponding to this extended abstract can be found at 
http: // envy . cs . umass . edu/“'rich/SEAL98/ sldOOl . htm. 
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seem to be based on temporal differences in estimated value (as in dynamic pro- 
gramming, heuristic search, and temporal-difference learning). Although finding 
a policy to maximize reward is still the ultimate goal of RL, RL present is much 
more focused on the intermediating goal of approximating value, from which the 
optimal policy can be determined. RL present is also as much about planning 
using a model of the world as it is about learning from interaction with the 
world. Whether learning or planning optimal behavior, approximation of value 
functions seems to be at the heart of all efficient methods for finding optimal 
behavior. The value function hypothesis is that approximation of value functions 
is the dominant purpose of intelligence. 

RL future has yet to happen, of course, but it may be useful to try to guess 
what it will be like. Just as RL present took a step away from the ultimate 
goal of reward to focus on value functions, so RL future may take a further step 
away to focus on the structures that enable value function estimation. Principle 
among these are representations of the world’s state and dynamics. It is com- 
monplace to note that the efficiency of all kinds of learning is strongly affected by 
the suitability of the representations used. If the right features are represented 
prominently, then learning is easy; otherwise it is hard. It is time to consider 
seriously how features and other structures can be constructed automatically by 
machines rather than by people. In RL, representational choices must also be 
made about states (e.g., McCallum, 1995), actions (e.g., Sutton, Precup, and 
Singh, 1998) and models of the world’s dynamics (Precup and Sutton, 1998), all 
of which can strongly affect performance. In psychology, the idea of a developing 
mind actively creating its representations of the world is called constructivism. 
My prediction is that for the next tens of years RL will be focused on construc- 
tivism. 
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Abstract. This paper proposes a new Q- learning method for the case 
where the states (conditions) and actions of systems are assumed to be 
continuous. The components of Q-tables are interpolated by fuzzy in- 
ference. The initial set of fuzzy rules is made of all the combinations 
of conditions and actions relevant to the problem. Each rule is then as- 
sociated with a value by which the Q-value of a condition/action pair 
is estimated. The values are revised by the Q-learning algorithm so as 
to make the fuzzy rule system effective. Although this framework may 
require a huge number of the initial fuzzy rules, we will show that consid- 
erable reduction can be done by using what we call “Condition Reduced 
Fuzzy Rules (CRFR)”. The antecedent part of CRFR consists of all the 
actions and the selected conditions, and its consequent is set to be its 
Q-value. Finally, experimental results show that controllers with CRFRs 
perform equivalently to the system with the most detailed fuzzy con- 
trol rules, while the total number of parameters that have to be revised 
through the whole learning process is reduced and the number of the 
revised parameters at each step of learning is increased. 

Key Words: Q-learning, fuzzy rule, interpolation, reduced condition. 



1 Introduction 

In case of solving problems with various types of I/O data that are related with 
each other in complicated manners, extracting embedded rules from these I/O 
data manually becomes quite complicated and hence is sometimes practically 
impossible. 

Recently, reinforcement learning methods have been successfully applied to 
various kinds of problems. Among them, Q-learningPQ is one of the widely-used 
methods, which employs Q-functions for evaluating condition/action pairs. One 
of the simplest ways to realize a Q-function is to look up a Q-table. Assume 
an environment that has n conditions for taking m actions. Each cell of the 
n + m dimensional Q-table holds a value of one of the conditions/actions pairs 
(Q-value). The values are revised through the whole learning process. Q-table 
is simple, but its size will explode when applied to practical problems in which 
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(c) Springer- Verlag Berlin Heidelberg 1999 



A Reinforcement Learning with Condition Reduced Fuzz Rules 



199 



continuous- valued conditions and actions are sometimes involved. In addition to 
this, only a portion of a Q-table is revised at each time of learning, and hence we 
can not make good use of the continuity of condition/ action values. Thus, we will 
propose a new framework where fuzzy reasoning is introduced to continuously 
interpolate the Q values. 

Q- learning requires only experienced condition/action and rewards combina- 
tions. Therefore, Q-learning can be applied to problems where meaningful I/O 
sets can not be specified beforehand. Thus, fuzzy rules can not be specified be- 
forehand by referring to domain-specific heuristics. Namely, the initial rule set 
has to be made of all the combinations of conditions and actions. This frame- 
work may lead to the explosion of initial rule sets. To cope with this problem, 
we introduce fuzzy reasoning by the use of “condition reduced fuzzy rules” to 
Q-learning. 

2 Q-learning with Condition Reduced Fuzzy Rules 

2.1 Interpolating Q-tables 

For utilizing the continuity of I/O values, many methods for interpolating cells 
of Q-tables have been proposed. We focus our attention to the methods where 
CMAC[^ architecture or Fuzzy reasoningj^ is incorporated. 

The Cerebellum Model Arithmetic Computer (CMAC) has been introduced 
to reinforcement learning [2] where a Q-table is represented by multi-layered 
tables whose partition of the cells is set to be cruder than the original table. 
Figure [U illustrates the idea where one dimensional Q-table with 9 cells (Figure 
[T|(a)) is represented by the combination of three tables with 3 -I- 4 -|- 4 (= 11) 
cells (Figure [11(b)). 

In case (a), 9 parameters are associated with these 9 cells, whereas in case (b) 
11 parameters are required. Through learning processes, these parameters have 
to be revised correctly. Thus the increase of total number of parameters seems 
to be disadvantage of CMAC (b), but CMAC enhances the effect of learning. As 
shown by the meshed part in Figure [T] only one cell is revised at each step of 
learning in case (a), whereas the effect of each revision is not limited to one cell 
in case (b). 

Fuzzy reasoning has also been successfully used to interpolate the cells of 
Q-tables|3]. This method estimates a Q-value for an action by fuzzy reasoning. 
The antecedent part of each rule consists of conditions and actions, and the 
consequent shows the Q-value of the rule. The framework of learning process is 
almost same as the basic Q-learning algorithm [T] except for the way to estimate 
Q values by using Takagi-Sugeno method j4], the way of selecting an action and 
the way of learning by revising the Q-value of each fuzzy rule. 

2.2 Condition Reduced Fuzzy Rule 

We propose fuzzy reasoning with Condition Reduced Fuzzy Rules (CRFR) and 
incorporate it to Q-learning. The antecedent part of each CRFR consists of all 
the possible actions and the selected conditions. Table [T] shows parameters for 
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Fig. 1. Interpolating a Q-table (a) by Cerebellar Model (b) or Fuzzy Sets (c) 



generating the initial fuzzy rules. Let us assume that each condition and action 
involve fuzzy sets and that their numbers are uniformly Nf. Then, in case of 
Nc + Na = 3 and Nf = 5, the normal form of fuzzy rules will be represented in 
the following form: 

if(Xi is PiyJ A (X2 is P2y^) A {X3 is P^y^) then {Q^ = C^), 

where Pxy, Qz and denote fuzzy set, Q-variable and its value, respectively. 
The combination of yi, 1/2 and such that 1 < yi,y 2 ,y 3 < 5 yields 125(=5^) 
fuzzy rules. On the other hand, in case of Nr + Na = 2, the CRFR will be given 
as 

if(ALi is PiyJ A {X2 is P2y^) then {Qi = C{) 
ii{X2 is P2y^) A {X3 is Psy^) then {Qm = Cm) 
if(Xi is PiyJ A {X3 is P3y^) then (Q„ = C„), 



where the total number of CRFR is 75(= 3 x 5^). 

Table 1. Parameters for generating initial fuzzy rules 



Nc. the total number of conditions 

Nr-, the number of conditions which are included in each CRFR 
Na-. the total number of actions 

Nf-. the number of fuzzy sets for each condition and action 

Ng-. the number of fuzzy sets having nonzero grades for an arbitrary input 



2.3 Introducing CRFR to Q-learning 

The framework of Q-learning with CRFR is given in Table [H In the experiment 
described in the next section, the Q- value of a condition/ action pair is estimated 
by the mean value of Ci with weighting by Ui {Q = {'^iOJiCi) / uji are 

given by the algebraic product of the grades of its antecedents. The action for 
the current condition is randomly selected with the probabilities calculated by 
the Q- values of assumed actions. Each time an action is selected, the Q values of 
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the rules that contributed the process of selecting the action (wi ^ 0) are revised. 
The obtained reward is distributed to each rule so as to ACi be proportional to 
w*, i.e., ACi = 



Table 2. The framework of Q-learning with CRFR 



1. Initialize Ci (value of Qi) 

2. repeat forever 

(a) repeat T times 

i. Randomly assume (select) a set of actions. 

ii. Calculate the grade of each rule { uh ) under the current conditions and 
assumed actions. 

iii. Estimate the Q- value for the current conditions/actions pair. 

(b) (after the second cycle) 

i. Calculate AQ for the last conditions/actions pair by the standard way of 
Q-learnings, i.e., the equation proposed in jlj. 

ii. distribute AQ to CRFRs’ ACi. 

(c) Select and execute one of the assumed (selected) actions. 

(d) Observe the next state and reinforcement signal. 



2.4 Comparison of Learning Efficiency 

In order to compare the performance of learning algorithms, we examine the total 
number of the parameters to be revised through the whole learning process, the 
size of revised parameters at each step of learning and the generality of the 
following algorithms: 

(a) standard Q-learning (QL) 

(b) Q-learning with interpolation by CMAC (QL-I-CMAC) 

(c) Q-learning with interpolation by normal Fuzzy Rule (QL-|-Fuzzy) 

(d) Q-learning with interpolation by CRFR (QL-I-CRFR) 

In order to examine the number of parameters, we assume that each condition 
and each action is uniformly partitioned into S regions, and in the case of using 
fuzzy rules, the number of fuzzy sets (labels) for each condition/ action is equal 
to be Nf. 

QL: The total number of parameters is equal to the number of all the combi- 
nations of conditions and actions as shown in Table (a). The rule revision at 
each step of learning is localized to one section. 

QL+CMAC: The total number of parameters depends on the number (fc) of 
tables and the width (gs) of their cells. Table [21(b) shows the number of param- 
eters in case that no redundancy is allowed, i.e., qs is set to be fc x is, where is 
denotes the width of the normal Q-tables cells as shown in Figure |T](b). 
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QL+Fuzzy: The total number of parameters depends on the number of con- 
sequent parts of the rules. Table |3] (c) shows the number in case of only one 
parameter is associated with each rule. In order to estimate the number of re- 
vised rules at each step of learning, we assume that fuzzy sets are placed like 
Figure[T] (c). The same number {Ng) of the fuzzy sets will have nonzero values for 
various values of inputs. Table |3](c) shows the number under this assumption. 

QL+CRFR: Table |3](d) shows the total number and the revised number of pa- 
rameters in case (d). They reveal that the total number of parameters is smaller 
than that in case (c) when > nc Cnt- They also show that the number 

of revised parameters is larger than that in case (c) when <]\[cCNr- 

We can set Nr such that these two conditions hold simultaneously. 



Table 3. Number of parameters in the interpolating methods 





total 


revised at each step 


(a) QL 


gNc-\-Na 


1 


(b) QL-bCMAC 


{f + (fe - i){f -b 1}"=+"“ 


k 


(c) QL-I-Fuzzy 


]\T-^ c+iV a 


j\jN c+N a 


(d) QL-bCRFR 


/-y TvriVr+fVQ, 

Nc^Nr-^^ f 


/-I ArA'r+A^a 

Nc^Nrlyg 



3 Experiments and Results 

3.1 Experimental Environment 

Figure |2] illustrates an experimental problem. Learning methods in this case 
will yield rules for controlling boats to go around the racing track. The state 
variables (condition for selecting an action) of this system are the current location 
(x, y), the velocity (oz, sy), the direction (r) and the angular velocity (w) of the 
boat, and the action is the combination of operating “steering wheel {hnd)" and 
“acceleration lever (acc)” . In this case, Nc = 6, Na = 2. The value of T (cf. 
Tablets 2. (a)) is set to be 25. 

States are calculated by sx(t+i) = 0.8 sx(t)-|-acc-cos(r(j)), sy(^t+i) = 0-8 s2/(t) + 
acc ■ sin(r(t)), W(^t+i) = 0-8 (^(t) + hdl and = r^t) + The large time- 

constant of the boat makes the controlling task difficult. 

Fuzzy membership functions are set as shown in Figure H] (c). Rewards in 
this case are set to be inversely proportional to the “distance between the boat 
and the nearby local target” as shown in Figure O and each time a boat collides 
with a fence, it will be penalized. Penalties are given as rewards with certain 
minus value. 
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turning 

points 




Fig. 2. Experimental problem (boat racing track) 



3.2 Comparison with Interpolation Method 

In order to compare the performance of each method under the condition that 
each method requires almost the same number of total parameters, we set S for 
QL, S for QL+CMAC, k and Nf to be 5, 24, 8 and 5, respectively. 

Table 0] shows the number of times the boat could go around the track after 
500,000 steps of learning, and also the number’ of collisions with the fences. We 
call the number of control signals for going around the track “lap time” . Table 
0] shows the average lap time and the average number of collisions for one round 
of the track. The upper part of Figure 0] shows the learning processes of these 
methods. In the figure, the lap time averaged over 10,000 rounds of QL+CMAC, 
QL+Fuzzy and QL+CRFR (in the case of iVr = 4) are shown by dashed thin 
line, thin line and thick line, respectively. 

The result shows that QL+CMAC could not achieved the performance equiv- 
alent to Fuzzy-based methods. The normal QL could not achieve the performance 
equivalent to other methods. 

Table 4. Performance of each interpolating method after 500,000 steps of learning 





Nr 


number of parameters 


total number 


average 


total (x 1, 000) 


each revision 


rounds 


collisions 


lap time 


collisions 


(a) QL 


391 


1 


755 


11,109 


660.877 


14.685 


(b) QL+CMAC 


465 


8 


3,470 


3,357 


144.057 


0.967 


(c) QL+fuzzy 


391 


256 


4,469 


1,328 


111.865 


0.297 


(d) QL+CRFR 


6 


391 


256 


4,469 


1,328 


111.865 


0.297 




5 


469 


768 


4,489 


1,561 


111.358 


0.348 




4 


234 


960 


4,584 


1,370 


109.058 


0.299 




3 


63 


640 


4,426 


1,774 


112.960 


0.401 




2 


9 


240 


3,838 


4,542 


130.268 


1.183 




1 


1 


48 


2,179 


10,288 


229.415 


4.721 



3.3 Performance of Condition Reduction 

The fundamental difference between the proposed method and QL+Fuzzy is the 
introduction of Nr. QL+Fuzzy can be seen as a specific version of the proposed 
method where Nr is always set to be Nc. Table 0] shows the results on all the 
possible values of Nr (1 ^ Nc) after 500,000 steps of learning. The lower part 
of Figure 0]shows the lap time averaged over 10,000 rounds. 
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average lap time 




Fig. 3. Comparison the learning process between Interpolating Methods 



Generally speaking, it is expected that the rules using all the conditions (the 
most detailed rules) will yield better performance than the rules with reduced 
number of conditions. However, Tableland Figure0reveal that the result with 
CRFR is equivalent to or even better than those with the most detailed rules. 
When Nr is set to be 1 or 2, the number of parameters is too small that the 
controller can not achieve good performance. The ratio of revised number of 
parameters (at each step) over the total number of parameters 
shows that Nr should be set to be small in order to enhance the effect of each 
step of learning, but the results show that it is difficult to be set as a small 
number. 



3.4 Robustness to the Complexity of Environments 

When other boats are existent in the environment, the controller has to take 
into account not only their locations {oxi, oyi) but also their velocities {osxi, 
osyi) since they are moving. The controller learns a skill for avoiding moving 
obstacles by receiving the minus rewards when it collides with another boat. 

In this case, the relation between conditions and actions are extremely com- 
plicated. In the case of two boats are in the environment {Nc = 10), even if Nr 
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is set to be small value, e.g. 3, total parameters and revised parameters at each 
step of learning are calculated to be 375,000 and 3,840, respectively. 

After 500,000 steps of learning, the two boats showed performances almost 
equivalent to the case where they are trained solely. We also examined the rela- 
tion between the average lap time and the step of learning, which showed that 
the two boats attained the level of 120 step/round within 1,500,000 steps. Com- 
paring with the result shown in Figure]^ we can say that the proposed method 
is not affected by the complexity of environments. 

4 Conclusion 

We have proposed a method for applying Q-learning to problems where contin- 
uous I/O data are involved, where the Learning/Adaptation is done by chang- 
ing Q-values of each Condition Reduced Fuzzy Rules. The experimental results 
elucidated the satisfying performance of the proposed method. This method is 
applicable to the problem where domain-specific heuristics are not known be- 
forehand. Furthermore, we can expect that it will learn novel actions, when we 
do not intend to utilize the continuity of I/O values, nor to yield symbolic rep- 
resentations, Q-learning may be integrated with other generalization methods, 
e.g. neural networks]^. Namely, this method is peculiar in reforming outputs of 
the proposed systems into human-readable symbolic rules. 
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Abstract. Hierarchical fuzzy modeling is a promising technique to de- 
scribe input-output relationships of nonlinear systems with multiple in- 
puts. This paper presents a new method of dividing input spaces for hier- 
archical fuzzy modeling using Fuzzy Neural Network (FNN) and Genetic 
Algorithm (GA). Uneven division of input space for each submodel in 
the hierarchical fuzzy model can be achieved with the proposed method. 
The obtained hierarchical fuzzy models are probable to be more concise 
and more precise than those identified with the conventional methods. 
Studies on effects of the weights on performance indices for the fuzzy 
model are also shown in this paper. 



1 Introduction 

Fuzzy modeling^ is a method to describe the characteristics of nonlinear sys- 
tems using fuzzy rules. For automatic acquisition of fuzzy rules, combinations of 
fuzzy logic and neural networks have been studied^. The Fuzzy Neural Network 
(FNN) in is capable of identifying fuzzy rules and tuning the membership 
functions by means of back propagation learning. This FNN has been applied 
to the fuzzy modeling of nonlinear systems. Sufficient data, which cover whole 
input space, is hard to obtain from the actual plant with many input variables. 
Hierarchical fuzzy modeling method using multiple FNNsI^ was proposed. Each 
sub model in the hierarchical fuzzy model has a smaller number of input vari- 
ables, and it does not need many data to describe the input-output relationships 
in the sub space. 

Karr et al. ^ proposed a combination of fuzzy logic and Genetic Algorithm 
(GA). GA finds fuzzy rules using the payoff for the success/failure of its actions. 
GA was applied to identification of hierarchical structure of fuzzy modell^ from 
given input-output pairs of data. The authors^^ have applied GA to selection 
of input variables of FNN hierarchical model. This method is very effective in 
the case where the plant has a strong nonlinearity. GA can find appropriate 
sets of input variables and a proper number of membership functions for each 
selected input variable from many candidates. The authors have also proposed 
a fuzzy modeling method which realized uneven division of input spaces using 
the FNNI^, and have applied the method to the hierarchical fuzzy modeling. 

This paper presents a new hierarchical fuzzy modeling method using the 
FNN and GA. The proposed method can find fuzzy submodels with unequally 
divided input space. 
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2 Hierarchical Fuzzy Modeling Using FNN 

2.1 Fuzzy neural network 

The FNN presented by the authors is a multi-layered back-propagation (BP) 
model with a specially designed structure for easy extraction of fuzzy rules from 
the trained NN. This paper uses Type I of the FNNs in 0. Fig. [IJa) shows an 
example of the FNN. 

This is a case where the FNN has two inputs X\ and X 2 , one output y and 
three membership functions for each input. The Back Propagation (BP) learning 
algorithm can be applied to modify the connection weights Wc,Wg, and Wb- From 
this network, the following simplified fuzzy inference can be extracted: 



where i?* is the i-th fuzzy rule. Ai ^2 sxe labels of membership functions, 

bi is a singleton in consequence, n is the number of fuzzy rules, /ii is the truth 
value of i?®, y,i is the normalized truth value, and y is the inferred value. 

Fig. nib) shows an example of membership functions in the antecedent formed 
in (A)-(D)-layers. The connection weights Wc, Wg determine the positions and 
slopes of the sigmoid functions / in the units in (C)-layer, respectively. Each 
membership function consists of one or two sigmoid functions. The outputs of 
the units in (D)-layer are the values of membership functions. The products of 
these values are the inputs to the units in (E)-layer and the outputs of the units 
are the normalized truth value in the antecedent 'jli in eq.([31). The output of 
the unit in (F)-layer is the sum of the products of the connection weights Wb 



i?® : ifxiisAi ianda;2is7li 2theny = bi{i = 1, 2, • • • , n) 



( 1 ) 

(2) 

( 3 ) 



Mi = ^*.1 (a^l) ^i.2 (X2) 




n 




( 4 ) 



@1 





Aij Ay Asj 



(A) (B) (C) (D) (E) (F) 



(a) Fuzzy Neural Network (b) Membership Functions 



Fig. 1. Fuzzy Neural Network 
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1 St layer 2nd layer 




(a) Hierarchical Fuzzy Model (b) Example of Chromosome 
Fig. 2. Hierarchical fuzzy model using FNNs and chromosomes 



and the normalized truth values. The connection weights Wb correspond to the 
singletons in the consequence bi in eq. m- The output in (F)-layer is, therefore, 
the inferred value y in eq.()l). 

Since the center-of-gravity method is used in (E)-layer, the updating method 
of connection weights, i.e. BP algorithm, needs some modifications. The learning 
algorithm for the FNN is well described in [ 5 ] . 

The feature of this FNN is that fuzzy rules can be extracted easily from 
the trained FNN. Three layered neural network can identify the input-output 
relationships. However, it is hard to extract rules from the three layered neural 
network. 

2.2 Procedure of Hierarchical Fuzzy Modeling 

Fig. I^shows an example of hierarchical fuzzy model which consists of FNN sub- 
models. The figure shows a case where the model has 5 inputs {xi,X2, X3, X4, X5), 
one output (y), a two level hierarchical structure. In Fig.| 2 ] y^~^ , y^~^ , y are 
the inferred values of the fuzzy sub-models. In the 1 st layer, the fuzzy model 
with the inputs x\ and X2, and the model with X3 and X4 are lined in paral- 
lel. The outputs of these models are y^~^ and y^~^ , respectively. These two 
fuzzy sub-models in the 1st layer greatly contribute to the input-output rela- 
tionships of the system. The input variable X5 is used for a small adjustment of 
the model. This fuzzy model reduces the number of divisions of each input space 
by constructing sub-models in a hierarchical manner. This reduction of divisions 
prevents the model from over-fitting. The obtained fuzzy model, therefore, has 
the generalities. 

The authors have proposed a hierarchical fuzzy modeling method using the 
FNNs and GAI^ISJ . Each sub-model was built by the FNN and a proper set of 
input variables and sets of membership functions for the sub-model were selected 
by GA. A hierarchical structure was constructed by finding proper sub-models 
one by one. 

This paper proposes a new hierarchical fuzzy modeling method which realizes 
uneven division of input space. GA is utilized to find an appropriate set of input 
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variables. A model having the selected variables is generated by the proposed 
uneven division method of input space and it is tuned by the FNN learning. The 
procedure of this hierarchical fuzzy modeling is as follows: 



1. The input-output data are divided into two groups A, B whose statistical 
characteristics are nearly the same. The group A constitutes training data, 
and the group B is used as a test set. The model identified from the data of 
group A is called model A. The number of layer h is initially set at 1. 

2. Using GA and the FNN, a sub-model is identified. A combination of input 
variables is encoded into a chromosome as shown in Fig.|^b). The number of 
genes Ig is the same as that of the whole candidates of input variables. If the 
value of a gene is 1, the corresponding input variable is used for the model. 
If it is 0, the input variable is not selected. The evolution of individuals is 
carried out by the following procedure: 

(a) Chromosomes are initialized to have 0 in each gene. The number of 
chromosomes is Ug. The binary number in each gene is flipped to 1 with 
the probability of pi . 

(b) Chromosomes are evaluated. The chromosome determines a combination 
of input variables to be used for the sub-model. The division process of 
the input space to be described in subsection |2.b| is carried out. In this 
process, the identification of singletons in the consequent parts of FNN 
is done with the data of group A. The performance index F used in this 
fuzzy modeling process is given by 



F = 



ES {V: 



BA 



Hb 




( 5 ) 



where ns is the number of the data of group B, yf is the output data 
of group B, is the inferred value of model A with data B, and S is 
the number of subspaces. The first term evaluates the generality of the 
identified model, and the second term is a criterion for the conciseness 
of the model. Coefficient k adjust the weights on the generality and the 
conciseness. 

An appropriate division of input space for each input variable is obtained 
as described in subsection 1^ 

(c) Individuals are ranked with this performance index. The worst riu, chro- 
mosomes are replaced with copies of better chromosomes. 

(d) Crossover and mutation operations are applied to the population. Cross- 
over is applied to the whole population except for one elite at the rate of 
Pc- Parents are randomly selected, and one point crossover with randomly 
selected crossover point is applied. Mutations are applied to each gene 
of all the chromosomes except for that of the elite chromosome at the 
rate of Pm- 

(e) Stop if the performance of the elite chromosome does not improve during 
rUend generations. Otherwise, go to step (b). 

The next step is fine tuning of the acquired model by the back propagation 

learning of the FNN [9]. The membership functions in the antecedent as well 
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the singletons in the consequence of the best model found in the above 
process are modified to obtain a better model. For the stopping condition of 
this learning, the following criterion is used: 



where ua and ns are the numbers of data groups A, B, respectively; yf and 
yf are the outputs of data A and data B, respectively; yf^ and yf^ are 
the inferred value of model A with data A and that of model B with data B, 
respectively; yf^ and yf^ are the inferred value of model B with group A 
and that of model A with group B, respectively. The first term on the right 
hand side in eq. ® is the precision of the model, and the second term is the 
criterion for evaluation of the generalities of the model. 

This identified model is called model h-1 and its output is denoted by y^~^ . 
The model h-k means that it is the fc-th model in the h-th layer. 

3. If the number of remaining input variables that are not used in the model 
h-1 is more than two, another model to compensate the error of model h-1 
is identified using GA. This error is used as the teaching signal for the next 
model. Some of the remaining input variables would be selected. The fine 
tuning of the acquired model is, then, carried out. The model is denoted by 
model h-2. If more variables remain, model h-S for the compensation of the 
error of model h-2 will then be identified. This modeling is repeated until 
the number of remaining input variables becomes less than two. The model 
h-1 will be used for the identification of the models in the succeeding layer. 
The outputs of the models h-2, -3, • • • will be the candidates for the input 
variables of the models in the next layer. 

4. Fuzzy modeling in the next layer is done using the sub-models identified in 
the previous layer. The output of model h-1 is always used. A combination 
of this output y^~^ and some of the outputs of models h-2, -3, • • • as well 
as the input variables not used in model h-1 is chosen by GA. The acquired 
model is denoted by {h-\- 1)-1. 

5. The evaluation criterion C of model h-1 and model {h -\- 1)-1 are compared. 
If the latter value is less, repeat the procedure from (3). If not, stop. Since 
the input variables which are used in the previous layer are not used in the 
succeeding layers, the acquired structure is simple. 

2.3 Unequal Division of Input Space 

Unequal division of input space for the fuzzy modeling is described in this section. 

The input space is divided so that the variations of data outputs across the 

subspaces are minimized. The procedure of input space division is as follows: 




1. If the given data have I input variables, I dimensional input space is to 
be divided. The input space initially has no division, i.e. the number of 
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I = 2,S=l, 
d=3 



S = 2 






Fig. 3. Division of input space 



subspaces S equals 1. FigureElshows a case where two inputs Xk, Xi are given. 
The top figure shows the initial division of input space. A fuzzy model with 
one rule is made with an FNN, trained, and evaluated under the criterion 
F in eq. The training of the FNN here is only to adjust the singleton in 
the consequence for efficient training. 

2. Input space is divided. S is increased by 1, i.e. S' = 5+1. Number of possible 
division points on each axis d is given a priori. Figure E] shows that it has 
three possible division points, d = 3. There are I x S x d possible division 
points in this case. Each division is evaluated as shown in the figure with 
the criterion given by 



s 

(7) 

S 

V is the accumulation of variation of data outputs in each subspace. The 
variation in a subspace s is expressed as: 

Vs = , Vs = — 

Us 71s 

where ris is the number of data in the subspace s, yj and f/j, are the data 
output and the average of the data outputs in this subspace, respectively. 
The division with the smallest V value is selected. An FNN with S rules is 
generated, trained, and evaluated. The training of the FNN is only to adjust 
the singletons in the consequence for efficient training. 

3. If F is not improved, stop. The model obtained before the above step is 
selected as the final fuzzy model. If it is improving, step [2] is repeated. Fig- 
ured shows an example of unequal division of the input space Xk, xi, and 
the membership functions constructed by the FNN. 
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7 = 2, 5 = 5. t/ = 3 




(a) Unequal division (b) Membership functions 
Fig. 4. Obtained division and membership functions 



3 Numerical Experiment 

Numerical experiments are discussed in this section. Model 1-1 was identified 
with the coefficient k in eq.® varied from 10“® to 1. The object nonlinear 
system was given by the following equation: 

y = (-4 -h -I- + 5 sin {x 2 + xs) 

-I- exp (1 -h 0:4 -I- X 5 ) (9) 

= (—4 -h a:io)^ -b 5sin (a:ii) -b exp (1 -b 0 : 12 ) 

xe = a ; o ^, xt = x^^, xs = x^‘^, 

Xg = Xq^X^^, xig = Xq® -b Xu = Xg + X 3 , (10) 

Xi 2 = X 4 + X 5 

where Xq to x^ were input variables. Xq to a:i 2 were arranged input variables ex- 
pressed by eq. (nn). Xi 3 was used as a dummy variable, that had no relationship 
with y. The number of candidates of inputs Ig was 14. The ranges of input vari- 
ables are shown in Table [T] The ranges were decided so that the input variables 
influenced the output nearly equally. 



Table 1. Range of variable 



variable 


range 


Xq 

Xi 

X2,X3 

X4 

X5 

Xl3 


{0,1, ■■•,20} 

{0.5, 0.7, ■■■,!. 5} 
{-1.0, -0.9, ■■■,5.0} 
{0.0, 0.1, ■■■,0.5} 
{-1.5, -1.4, ■■■,0.5} 
{0,1,- ■■,17} 



Eighteen sets of eighty pairs of input-output data were generated. The input- 
output data were normalized within the range [0,1) for the fuzzy modeling. The 
parameters of GA were set as follows: Ug = 20, Us = 2, pi = 0.1, Pc = 0.85, 
Pm = O.Oland rriend = 10. The number of possible dividing point d was set at 9. 
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Two of the eighteen sets were used each for A and B group, i.e. tainning data 
and test data respectively, in an experiment. Nine experiments were done. 

The input variables of the model 1-1 identified by the proposed method were 
XiQ and Xu- The input space was divided into 7 subspaces. FigJ^a) shows the 
divisioin of input space obtained with the proposed method. The mean square 
error (MSE) for test data was 0.0199. 

The model 1-1 identified with the conventional method in [7] also had the 
same combination of input variables, x\q and xn, and the number of membership 
functions were 6 for xig and 7 for xi\. The number of subspaces was 42. FigE^b) 
shows the division of input space with the conventional method. The MSE was 
0.0108. Obtained models are summerized in Table [3 



Table 2. Comparison of the models 1-1 





proposed 

method 


conventional 

method 


selected variables 


*10, *11 


*10, *11 


mean square error 


0.0199 


0.0108 


number of rules 


7 


42 



•*11 -*11 




(a) proposed method (b) conventional method 
Fig. 5. Division of Input Space 



4 Conclusion 

This paper presented a hierarchical fuzzy modeling with FNN and GA. The pro- 
posed method generated more precise and concise model than the one by the 
conventional method. Numerical experiments show that the weights on the per- 
formance indices affect the generality and the conciseness of obtained submodel. 
These weights can control the characteristics of the obtained model. Various 
models can be generated as candidates for the model by changing the coeffi- 
cient. 
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Abstract. An intelligent surveillance planning system must allocate available 
resources to optimize data collection with respect to a variety of operational 
requirements. In addition, these requirements often vary temporally (i.e., targets 
of interest move, priorities change, etc.), requiring dynamic reoptimization on- 
the-fly. Allocation of surveillance resources has typically been accomplished 
either hy human planners, (for small problems of very limited complexity) or 
by deterministic methods (typically producing suhoptimal solutions which are 
incapable of adapting to dynamic changes in the environment). The method 
presented here solves these problems by using evolutionary programming to 
optimize the simultaneous and coordinated scheduling of multiple surveillance 
assets. The problem of allocating unmanned aerial vehicles (UAVs) to acquire 
temporally variable, time-differential intelligence data is addressed. Imposition 
of realistic constraints ensures solution feasibility in real-world problems. This 
implementation can be modified to optimize solutions for a suite of different 
surveillance asset types, such as manned vehicles and satellites. 



1 Introduction 

Airborne reconnaissance involves allocating a limited number of surveillance assets 
to a set of ‘targets of interest’ . Although aerial surveillance is often accomplished by 
means of human pilots, there are times when other intelligence gathering methods are 
required. Unmanned aerial vehicles (UAVs) provide the capability to gather 
intelligence in areas where inherent danger/threats would present an unacceptable risk 
to human pilots. In addition, they can often operate more covertly than manned 
aircraft. Whether under human or autonomous control, the goal is to obtain the 
desired information within the imposed constraints: available fuel capacity, turning 
rates, desired sensing/imaging requirements, etc. In most cases the surveillance asset 
is required to return to its original base, in effect completing a ‘tour’ of the assigned 
targets of interest. Hence, the problem is analogous to a set of multiple constrained 
traveling salesman problems (TSP) solved in parallel, albeit with notable differences 
due to the specific problem domain. 

Time-differential imagery is important to reconnaissance as it is often the change in 
an image (e.g., movement of targets) which is of interest to the analyst. Thus, in this 
TSP it is necessary to revisit a target of interest on a periodic basis to obtain time- 
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differential images. When multiple surveillance assets are available, their flight paths 
may be optimized over and around the targets of interest cooperatively, i.e., desired 
imaging schedules are met via a combination of assets. The problem can be 
formulated as a simultaneous tour assignment for a set of U A Vs such that the solution 
optimizes the mission goals. Importantly, this formulation permits combining both 
UAVs and overhead assets (i.e., satellites) within one optimization framework, as 
opposed to solving the problems separately (and perhaps, suboptimally). Providing 
for simultaneously optimizing allocation of multiple asset types represents a 
capability that was not possible with prior methods. 




Fig 1. Geographic plot of a multi-target UAV reconnaissance scenario with targets of interest 
color coded by priority. Elliptical regions define threat zones. 

In the real world, no intelligence gathering system operates in a completely covert 
manner. Lack of covert operation often poses a significant risk to reconnaissance 
vehicles. UAVs are often tasked to fly over hostile territories. Surface-to-air missile 
(SAM) sites pose a threat to any flying object within range of the missile. Minimizing 
threat exposure is important since the UAV may have to return to its base intact for its 
reconnaissance information to be usable. Potential threats from SAM sites are 
generally known with some degree of precision so their presence can be incorporated 
into a path evaluation function. Fig. 1 shows a typical scenario with a target-rich 
environment (targets of opportunity are indicated by various sized shaded rectangular 
shapes). A set of spatio-temporal paths for multiple UAVs (based at two 
geographically separate locations) must be planned over this hostile environment. In 
this example, over 100 targets of opportunity must be imaged within the desired 
surveillance time, fuel, and acceptable threat exposure constraints. 

2 Background 

This problem belongs to the class of NP-complete problems. The search space of 
candidate solutions for even small sized problems is too large for direct enumeration. 
Given N surveillance platforms and K targets, there are [M2*(^f-1)!] potential 
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solutions. Applicable feasibility constraints reduce this somewhat, but for a typical 
problem with only 5 UAVs and 100 targets, there exist over 2.3 x 10‘* possible 
solutions. Branch and bound techniques [8] are applicable for reducing the size of the 
search space, but typically they only reduce the number of possible search paths by a 
few orders of magnitude. 

Chief among the known approaches to solving this problem is the ubiquitous greedy 
method wherein targets are first prioritized then assigned for imaging. Assignment 
proceeds through the list until all available resources (e.g., fuel) are exhausted. This 
method forms the basis for the rank-trimming (RT) algorithm. Targets are first ranked 
using a target value function. UAVs are then assigned to visit the rank-ordered targets 
with respect to the desired target imaging periodicity. Because of the heuristic UAV 
and target assignment strategy, this rarely results in an optimal solution. Variations of 
this technique include scheduling each UAV according to a ‘best-fit’ metric which 
attempts to maximize utilization (total imaging time). 

Methods for solving the TSP for a single entity range from neural network algorithms 
[6] to techniques incorporating variants of evolutionary computation [3], [4]. Blanton 
and Wainwright [2] solved a related multiple vehicle routing problem (VRP) using an 
order-based genetic algorithm. Their approach optimized assignment of routes for 
multiple vehicles visiting a set of customers in prescribed time windows, given a 
single start/terminal point for all vehicles. Although similar in concept, the 
formulation did not allow for multiple visits to any site, utilized fixed time windows, 
and required all vehicles to choose tours constructed from a set of defined (i.e., 
geographically fixed) paths. 

All of the aforementioned methods have deficiencies in one or more of the following 
areas: 1) generation of optimal solutions for multiple, cooperative, time-coordinated 
entities, 2) creating integrated solutions for multiple asset types (i.e., overhead 
sensors), 3) incorporating revisitation (re-imaging) of target sites for time-differential 
surveillance information, and 4) scalability to larger (real-world) problem sizes. 

Porto and Fogel [7] demonstrated the use of evolutionary programming (EP) [5] as an 
optimization technique to generate optimal cooperative behaviors for multiple 
vehicles with respect to a set of mission goals. Complex, interactively intelligent 
behaviors which optimized vehicle routes, firing capabilities, and variable action 
sequences were generated. The evolved vehicle routes were only constrained by 
terrain and the physical dynamics of the vehicles. Obvious similarities between 
evolving cooperative behaviors and the assignment of flight tours for multiple 
surveillance assets inspired confidence that EP is well suited for solving this problem. 

3 Technical Approach 

There are several key aspects to developing a successful evolutionary optimization 
solution to this problem. Eirst, a suitable representation must be defined. The chosen 
formulation represents the tour for each UAV as an ordered sequence of targets to 
image. The tour for each UAV is a unique list instantiation. Elexible (and more 
realistic) cooperation between multiple UAVs is achieved by incorporating 
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individually variable UAV tour start times. Starting and ending points in the list are 
constrained to be the initial base of the vehicle. Replication of targets in these lists 
allows for multiple revisits of targets (time-differential re-imaging). The number of 
replications necessary, R, can be defined as follows: 

R = F,/K^i„ ( 1 ) 

where Ft is the maximum possible flight (or surveillance) time over all UAVs, and 
Kmin is the smallest desired imaging periodicity. This representation is also extensible 
to include overhead assets with potentially variable imaging schedules. Once the list 
sequences are created, evaluation consists of calculating the number of targets 
imaged, adjusted by priority, with respect to desired imaging periodicities. Similarly, 
constraints (i.e., remaining fuel and survivability) can be evaluated by sequential 
checks through the cumulative flight distance and exposure through threat zones. 

Path sequences for each UAV are aggregated into a single solution that is evaluated as 
a whole. Due to re-imaging requirements as well as cooperative capabilities (when 
incorporating multiple UAVs), individual UAV path sequences cannot be evaluated 
independently. Evolutionary programming provides an efficient optimization method 
for this problem. Because EP optimizes total behaviors instead of individual parts, it 
is well suited to exploit the inherent interdependence of the UAV imaging schedules. 

4 Implementation 

Four separate phases of flight (each with a unique but constant flight speed) are 
modeled, including ramp-up/down, ingress (to the surveillance area), steady-state 
flight, and egress (flight back to the base). Associating a specific starting base for 
each UAV allows optimizing for multiple UAVs launched from a variable number of 
bases. Fuel consumption rates are a function of the specific phase of flight. No fuel 
penalty is imposed for changing direction. Refueling is not permitted during flight. 

Targets of interest are modeled as stationary geographic points in space with assigned 
priorities, desired imaging periodicities, and image qualities. Required imaging times 
are a function of both target size and required image quality. Since the angle of 
inspection (imaging angle) is not currently modeled in this study (it is assumed that 
UAV images can be taken directly over targets), UAV tours can be specified 
efficiently as point-to-point paths. Additionally, the time to the first (closest) target is 
assumed to be greater than the time required to reach an operational imaging altitude. 

Direct routing of UAVs to and from a set of predefined targets leads to paths that may 
intersect threat zones. Way-points which can delineate paths around an object are 
incorporated to address this problem. These ‘pseudo-targets’ have the same structure 
as regular targets except that their imaging times, priorities, and revisitation 
periodicities are all ignored. In this way, the addition (or deletion) of appropriate way- 
points in the tours allows the model to permit maneuvering around threat zones. 

Probability of survival is a function of the length of time a UAV spends traveling 
through threat zones. Threat zones are modeled as 3-D cylindrical regions extending 
through the UAV vertical flight envelope. Any path intersecting a threat zone 
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boundary is assigned a probability of kill, Pk, proportional to the length of the path in 
the zone. A path intersecting a threat zone through its center (maximal exposure to the 
threat) is assigned a zero survival score. Path segments not intersecting threat zones 
are survivable with probability one (mechanical failures are not modeled). The 
cumulative survival probability over the entire flight tour for UAV, can then be 
expressed as follows: 

M 

Ps, = r[(l- a,.,P%) (2) 

i=l 

where 

M = total number of threats in the scenario 
Ri = kill radius of threat i 
Cj = chord length of UAV, passing by threat i 
Pkij = probability of threat i killing UAV, 

= Cjj/2Ri 

a,, G {0,1} denotes the vulnerability of UAV, to threat i 

Each pass of a UAV through a threat zone is treated as an independent event allowing 
paths to intersect threat zones multiple times in a single tour. In addition, feasibility 
constraints dictate that all UAVs must have sufficient fuel to return to their bases. At 
least one target is assumed to be within the range (including return) of each 
originating base point. UAV tours with zero survivability are feasible but have zero 
fitness score. No advantage is gained for returning to base with more than the 
minimum fuel load. 

4.1 Mutation 

Given the sequential target-list problem representation, numerous mutation operators 
are possible which span the range of small to large jumps through the search space. 
Unless otherwise specified, random selection of list entries indicates selection with 
equal probability for all outcomes. The mutations operators implemented are: 

1) Swap two adjacent randomly selected targets in the list. 

2) Swap two non-adjacent randomly selected targets. 

3) Move a randomly selected target to the bottom of the list. 

4) Add/Delete a way-point to/from the list at a randomly selected point. 

5) Modify a randomly selected way-point by random alteration of its position. 

6) Alter the starting launch time for a randomly selected UAV. 

The mutation operator which adds a new pseudo-target (way-point) to the list utilizes 
the position(s) of adjacent target points in the list. New pseudo-target positions are 
randomly generated within a bounding circle with radius equal to the distance 
between adjacent target points in the list. Mutation of pseudo-target locations is 
accomplished by adding a random variable N(0,1) in 91^. Mutation of initial UAV 
launch times uses addition of a N(0,1) random variable to the existing starting time. 
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4.2 Constructing a Feasible Solution 

Construction of initial list sequences is accomplished by one of three methods: 1) list 
sequence generation via random selection, 2) sequence generation based upon ranked 
target priorities (greedy method), and 3) sequence generation as read from file. Thus, 
heuristic solutions can be incorporated allowing for direct comparison with existing 
resource allocation methods. After creating the target-list sequences, solution 
construction proceeds by selective inclusion of targets into a feasible tour. Feasibility 
constraints are checked as each target (or way-point) entry is read from the list. 
Available fuel (a critical constraint) is constantly updated throughout this process. If 
no constraints are violated, the target is added to the tour and the next potential target 
point is examined. Upon violation of any constraint, the potential target is skipped, 
and the next target in the list is examined. Since targets are replicated N times in each 
list, solutions are generated which allow UAVs to visit a target more than once. 

4.3 Fitness Evaluation 

Solution fitness is evaluated after constructing tours for all available UAVs. The 
fitness function, which addresses the requirement for time differential target imaging, 
is defined by calculating user satisfaction as a function of time. User satisfaction at 
time tk can be expressed as follows: 

N 

S{tk)=ll (Tv,*-Psj) (3) 

i=0 

where target value for the ith target, imaged by UAV^ is defined as: 

Tvi = a * Pr, H- P * A, (4) 

N = total number of targets in the scenario 

Pri = priority of target i 

Ai = normalized area of target i 

Psj = cumulative survival probability of UAV, 

a, p G [0.0, 1.0] 

The fitness function, summed over the entire time of the desired surveillance period is 

F=Y,S{h) (5) 

The formulation of Eq. 3 allows for implementing the fitness function calculation 
over various time resolutions by discretizing over different time periods. This 
facilitates inclusion of the concept of an acceptable time-window for imaging a target 
with respect to its previous imaging time. Incorporating survival probabilities results 
in a penalty function for targets imaged by UAVs whose paths intersect threat zones. 
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5 Experiments and Results 

Experiments with a variable number of threat zones, target distributions, target 
priorities, and desired image periodicities were performed. Six UAVs were simulated, 
each with identical physical capabilities. Forty targets were (uniformly) randomly 
distributed in the (10,000 geographic units squared) searchable domain. Target 
priorities (in the range of [1-10]) and target areas, A, 6 (2x2, 5x5, or 10x10 
geographic units } were randomly selected using a uniform distribution. In all tests, a 
24 hour surveillance period was used, and each UAV was capable of reaching the 
farthest target in the search area and returning to base within its available fuel limit. 
When used, three threat zones with a radius of 50 geographic units) were modeled. 

EP without self-adaptation of mutation parameters was used with a of 100 parents, 
each generating 5 offspring per parent per generation. Selection of mutation operators 
was from a uniform random distribution. After scoring each candidate solution, 
tournament selection probabilistically culled the least-fit members from the 
population (tournament size 10). The RT greedy algorithm was used to initialize the 
starting population. This provided a performance benchmark for comparison with the 
evolved solutions (indicated by the initial fitness values at generation 0 in Fig. 2). 
Results presented below were averaged over all 20 runs for statistical validation. 




Fig. 2. Graph showing evolved fitness scores vs. generation (averaged over 20 runs). 

Algorithmic performance with no requirement for revisiting target sites (all target 
periodicities, K, set equal to 24 hours) was tested in scenarios with and without threat 
zones. Evolved solutions quickly exceeded the performance of the RT method both 
without and in the presence of threat zones. The average number of added way-points 
(per UAV) in the solution for the no-threat zone scenario was less than one indicating 
evolution of relatively efficient solutions. Complexity of evolved solutions (as 
measured by the number of way-points added to a solution) increased with the 
introduction of threat zones, to an average of 6.4 per UAV. 




222 V.W. Porto 



Next, re-imaging requirements for all targets were set with periodicity K=12 hours. 
Results of this more constrained problem showed lower overall fitness scores as 
expected due to the imposed re-imaging constraints. Interestingly, the number of way- 
points in the no-threat tests increased to an average of 11.4 per UAV. Analysis 
indicated the additional maneuvering manipulated total flight times to increase the 
satisfaction of imposed target revisitation schedules. Finally, solutions for scenarios 
with targets of interest with multiple periodicities were also evolved. Each target was 
assigned one of three periodicities {K=A, 6, or 12 hours) selected from a uniform 
distribution. Again, significant improvements over the existing RT solutions were 
obtained (> 50% increase in user satisfaction) after relatively few generations. 

6 Conclusion 

Results of this evolutionary programming approach demonstrate performance that is 
significantly better than that obtained with current typical algorithms. Solutions for 
realistic numbers of surveillance assets, targets, and threats are possible within a few 
hours of computational time. Convergence to optimal or near-optimal solutions on 
problems with a realistic number of targets is now within the capabilities of modern 
computers. The inherent design of the EP algorithm is also well suited for 
implementation on parallel processing machines. This research provides a means for 
optimizing the allocation of multiple UAVs with respect to the reconnaissance goals 
and environmental constraints. Perhaps of greatest importance is the ability to 
incorporate other surveillance platforms and sensor types into the simulation. The list- 
based object-oriented design facilitates this extensibility. Euture research will include 
increasing model fidelity and integration with other sensor suites (i.e., satellite assets). 
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Abstract. Genetic Algorithms (GA) are applied to evolutionary neural 
networks to control a rolling inverted pendulum. The task of a rolling 
inverted pendulum is to control the driving force of a cart on which one 
side of a pole is jointed by a rotary shaft in order to roll the pole up from 
the initial state of hanging down and to keep the pole standing reversely. 
The controller is a multilayer perceptron (MLP) with three layers whose 
weight coefficients are evolved and optimized by GA. 

Experiments for evolving the weights of two types of MLPs are conducted 
and their results are compared. Simultaneously, the effect of the weight 
ranges of neural networks on evolutionary results is investigated. In these 
evolutionary experiments, MLPs are generated that successfully control 
the driving force of the cart to roll the pole up and stand it inversely. 
MLPs also gain the intelligent control patterns with a few swings that 
correspond to the variations in the maximum driving force of the cart. 



1 Introduction 

The pole-balancing problem has been attempted many times previously with various 
control methods including PID control, nonlinear control, fuzzy control, and neural 
networks EH. The target of these attempts is to find the control rules that make a 
pole stand up at a certain position from the initial state of pole angles between -7t/2 
and 7t/2 when the angle of the upright pole is assumed to be zero. 

In this paper, our target is to get an MLP (multilayer perceptron) or a feed-forward 
neural network to control the driving force of a cart in order to make a pole stand up 
at a certain position from the hanging state (i.e. the pole angle is tt or -tt). This control 
problem requires two types of control. The first one is to control the driving force 
required to roll the pole up to the upright position and the second one is to control 
the force needed to keep the pole standing up. It is supposed that it is difficult to 
implement these two types of control into an MLP. Thierens [4] attacked this problem 
with evolutionary neural networks whose weights are restricted to ensure the second 
type of control. 
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Fig. 1. A pole-cart system 



In this paper, 
two types of MLPs 
with no restrictions 
are evolved to find 
solutions for this 
problem and the ef- 
fect of the range 
of weights in neu- 
ral networks on evo- 
lutionary results is 
investigated. In ad- 
dition, many MLPs 
with intelligent con- 
trol patterns are 
found by evolution 
in the experiments 
that have four val- 
ues of the maximum 
driving force. 



2 Evolutionary System 

2.1 The System Structure 

The diagram of the evolutionary system is shown in Fig. [2] GA transfers a set of 
weights to an MLP as an individual in order to evaluate it. An MLP calculates the 
driving force at each time step from the state parameters of an inverted pendulum. 
The inverted pendulum simulator calculates the status at the next time step from 
the driving force and the current status using the equations of motion. The evaluator 
observes the transition of status and calculates the fitness value with a fitness function 
and return it to GA. This cycle for the evaluation of an individual is repeated as many 
times as the population size. GA evolves the population of the weight sets using these 
fitness values. 

2.2 Simulation of a Rolling Inverted Pendulum 

The model of the pole-cart system in this study is simulated in two-dimensional space 
and no friction of the rotary shaft or cart sliding is assumed. The equations of motion 
given by Anderson [S] are simulated at discrete times. The velocity x of the cart and 
the angular velocity 9 of the pole at time t-\-l are calculated with the Runge-Kutta 
approximation method. The position x of the cart and the angle 9 of the pole are 
calculated with the Euler approximation method. For this simulation, the constants 
are the time step {At = 0.02 seconds), the mass of the cart (rric = 0.5 kilograms), the 
mass of the pole (rup = 0.1 kilograms) , the pole length (Z = 0.5 meters) and gravity 
{g = 9.8 mj se(?). 

The initial states of the cart position, the cart velocity and the pole angular velocity 
are set at 0.0. The pole angle is defined as 0 degrees when the pole is standing upright 
on the cart and the range of the angle is between -180 degrees(-Tr) and 180 degrees 
(tt). The initial state of the angle is set at 180 degrees which means the pole is in the 
hanging state. 
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Fig. 2. Evolution system of inverted pendulum controls 



2.3 MLP Architectures 

The architecture of an MLP has three layers. The structure of a three layer-perceptron 
is expressed by a triplet [rii- Uh- no) where Ui, n^, and Uo are the number of inputs, the 
number of neurons in the hidden layer and the number of neurons in the output layer, 
respectively. Two types of MLPs (5-5-1) and (6-5-1) are prepared. Furthermore, 
the following sigmoid function is introduced to output the value for the range between 
-1 and +1. 



/(u)=tanhu. (1) 

The driving force of the cart is given by the following equation: 

f orce(x(t),x{t),9(t),d(t)) = Inn ^ F-max ; (2) 

where fNN is the output of an MLP and Fmax is the constant which determines 
the maximum force. 




Fig. 3. Architecture of MLP(5-5-l) Fig. 4. Architecture of MLP(6-5-l) 



3 Application of GA 

The Genetic Algorithm applied to this problem is the basic model of the Forking 
Genetic Algorithm |6|, in which parents and offspring are selected together by the best 
N selection method. 
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The weights of an MLP that form an individual are coded by a 15-bit Gray code. 
The movable range of the cart is limited to -3.0 m and 3.0 m. When the cart is moving 
in this range, the htness function is dehned as follows: 

STEP 

fitness = X! (3) 

t=o 

where STEP is the maximum number of simulation steps and a is the weight constant. 
If the cart runs outside of the range limit, the position of the cart and the angle of the 
pole are fixed at the point when the cart runs outside the range. This fitness function 
tends to keep the pole at the position 0.0. 



4 Empirical Study 

The objectives of this study are to verify the possibility of evolving a neural network 
which realize the complex controls mentioned above, to investigate the effect of varying 
the neural network weight range on evolutionary results, and to observe control patterns 
when the maximum driving force is varied between 20N(Newton), ION, 3N, and IN, 
respectively. The weights are encoded by a 15-bit Gray code and are converted to real 
values with three ranges of Range A (-16.384 ~ 16.383), Range B (-4.09600 ~ 4.09575), 
and Range G (-1.6384 ~ 1.6383). The experiments are conducted with the combined 
conditions of these maximum driving forces and weight ranges, and the GA parameters 
of population size = 400, crossover rate = 1.0, mutation rate = 0.002 and maximum 
trials = 100,000. 



4.1 Experiments and Results 

Tables [T] and show the results of experiments for two neural network architectures: 
(5-5-l)model and (6-5-l)model, respectively. In these tables, ‘Trials’ is the average 
number of trials when a solution is found in 20 experiments; ‘No. of Swings’ is the 



Table 1. Results of evolutions with the (551) model 



Weight range 


-1 max 


Trials 


No. of Swings 


Success rate 


Force Freq. 


Range A 


20 


79069 


2 


1/20 


high 


-16.384 


10 


2680 


0.5 


1/20 


high 




3 


10331 


2.5 


15/20 


high 


16.383 


1 


13417 


5 


1/20 


low 


Range B 


20 


- 


- 


0/20 


- 


-4.09600 


10 


2563 


1.5 


4/20 


high 




3 


- 


- 


0/20 


- 


4.09575 


1 


- 


- 


0/20 


- 


Range C 


20 


- 


- 


0/20 


- 


-1.6384 


10 


- 


- 


0/20 


- 




3 


- 


- 


0/20 


- 


1.6383 


1 


- 


- 


0/20 


- 
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Table 2. Results of evolutions with the (651) model 



Weight range 


P 

4 max 


Trials 


No. of Swings 


Success rate 


Force Freq. 


Range A 


20 


43531 


0 


1/20 


high 


-16.384 


10 


64018 


0 


9/20 


high 




3 


22270 


1.5 


11/20 


high 


16.383 


1 


44384 


2 


17/20 


high 


Range B 


20 


- 


- 


0/20 


- 


-4.09600 


10 


34815 


0 


7/20 


high 




3 


29179 


1.5 


9/20 


low 


4.09575 


1 


69340 


2 


10/20 


low 


Range C 


20 


28013 


0.5 


1/20 


high 


-1.6384 


10 


11548 


0.5 


4/20 


low 




3 


12798 


2.5 


2/20 


low 


1.6383 


1 


42880 


3 


4/20 


low 



number of swings contained in the control pattern of the best solution found in 20 
experiments; ‘Success Rate’ is the number of times that the solution is obtained in 20 
experiments; and ‘Force Freq.’ indicates whether the frequency of the on-off control is 
high or low. 

In the standard (5-5-1) model of Fig. |3l input signals are the position and velocity 
of the cart and the angle and angular velocity of the pole. In the (6-5-1) model of 
Fig. [4] the input signals are the same as those of the (5-5-1) model, except that the 
parameter of the angle 0 is separated into the sign and the absolute value of the angle 
in order to reduce the discontinuity of the angle parameter. 

Comparing the results of Table [T| and it is clear that the success rates of the (6- 

5-1) model are better than ones of the (5-5-1) model in almost all experiments based 
on the combined parameters of the maximum driving forces and weight ranges. In the 
(5-5-1) model, it is supposed that the discontinuous cliff of the angle parameter makes 
more difficult the evolution of neural networks. On the other hand, the success rate 
tend to decrease as the maximum driving force increases. It is supposed that the large 
driving force makes it difficult to stabilize the pole at the inverted position, though it 
makes it easy to roll the pole up. 

4.2 Control Patterns of ENN 

The control patterns of the evolutionary neural networks with the (6-5-1) model that 
are obtained as the solutions of four maximum driving forces in the cases of Range A 
and C are shown in Figures El to [T2] by simulations of motion. 

In the case of Range A, when the maximum driving force is 20N or ION, the pole 
stands up in a single roll without swing by a strong force as shown in Fig. Eland Fig. El 
However, when the maximum driving force is 3N, the pole stands up after one swing and 
a half. Similarly, when the maximum driving force is IN, the pole stands up after two 
swings. In the case of Range C, similar control patterns are observed. These control 
patterns show that the kinetic energy to stand the pole is accumulated by swings, 
because the force is not sufficiently strong to make the pole stand up in a single roll. This 
idea comes easily to human minds through experiences and learning. However, it has 
been almost impossible for conventional artificial intelligence to obtain an intelligent 
solution of this problem without human help. These experiments corroborate that it is 
possible to obtain high intelligence through evolution. 
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Fig. 9. Time transitions of the angle, 
the position, and the force[/ivAf x20N(6-5-l 
Range C)] 



forc«(N) 




Fig. 10. Time transitions of the angle, 
the position, and the force[/ATAr x ION (6-5-1 
Range C)] 
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Fig. 11. Time transitions of the angle, 
the position, and the force[/ivAf x3N(6-5-l 
Range C)] 
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Fig. 12. Time transitions of the angle, 
the position, and the force[/ivAr x lN(6-5-l 
Range C)] 
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In the case of Range A, the control patterns are an on-off control with high fre- 
quency as shown in Figures[5]to[8l Because a strong force is needed for the pole to stand 
up in a single roll, the neural networks with larger weights survive on the evolutionary 
process. However, the neural networks with larger weights can hardly make the driving 
force small. Therefore, it is guessed that they select the on-off control with high fre- 
quency to stabilize the pole at the inverted position. For practical use, a driving force 
with a high frequency on-off control is irrelevant. In the case of Range C, the control 
patterns are very similar except that the nonlinearity of the sigmoid functions is re- 
duced by small weights. As the results imply, it is supposed that neural networks with 
small weights tend to select control patterns with swings as shown in Figures E] and 
I I III Although the frequency of semi on-off control lessens, the stabilization performance 
worsens as shown in Figures IHI to 11 21 



5 Conclusions 

This study on applying evolutionary neural networks to control a rolling inverted pen- 
dulum confirm that it is possible for a machine to obtain a higher intelligence by 
evolution that is hardly obtained by control theories and conventional artificial intel- 
ligences without human help. The intelligent MLPs are evolved by GA which have 
sophisticated control patterns with the different number of pre-swings, based on the 
magnitudes of the maximum driving force. The two types of MLPs are compared for 
the evolutionary performance, and it is found that the (6-5-1) model, which has a con- 
tinuous angle parameter, achieves better performance. The effect of the weight range 
of neural networks on the control patterns is made clear through these experiments. 

Future works are to evolve MLPs with more robust and smoother control, to analyze 
the functions of the MLPs obtained as solutions, and to evolve weights and architectures 
of neural networks together 0. 
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Abstract. We studied a baggage carriage problem, in which agents try 
to carry baggage from a pile to their base, evolving a data structure, 
called n-BDD, expressing action strategy of the agents. Through this 
problem we consider emergence of cooperation among heterogeneous 
agents, or each agent has a particular ability, which is different each 
other. We formalize heterogeneity by defining forte actions and foible 
actions of agents. Emergence of cooperation was observed by simulation. 
Different types of agents are emerged to behave different way. We also 
observed that agents in a type branch to have two different roles. 



1 Introduction 

Cooperation among a group is a subject in the field of multi-agent systems. There 
are two ways of cooperation by many agents. One is to share a task by agents, 
and every agent plays the same role. We call this way homogeneous cooperation. 
The other is to play a particular role by each agent, that is, the right man in 
the right place. The way should be called heterogeneous cooperation. 

A number of approaches to emerge cooperative interactions among agents are 
investigated. H.Yanco et aim tries to emerge cooperation among mobile robots 
by learning non-verbal communication. Reinforcement learning was also used to 
acquire social action of agents and decision policy in multi-agent systems 
Some researches mimic natural creatures, such as ants and bees, to organize 
social behavior among agents [US] and apply the behavior to real robots |S]. 

In order to study evolution of heterogeneous cooperation, we use a group of 
agents which are heterogeneous. That is, some agents of a group have different 
ability from others. Target of this paper is to observe evolution of heterogeneous 
cooperation that is reflected by the heterogeneity. For this purpose we use a 
data structure, called n-BDD, to express action strategy of agents. It was first 
proposed by Moriwaki et al. [7j and we have demonstrated its applicability (BE] • 

We give a goal to a set of heterogeneous agents with an environment, and 
expect them to evolve their appropriate behavior there. This evolution is not 
obvious, because agents’ abilities do not relate with their behavior directly. They 
have to find their appropriate behavior only through their experience. In this 
paper, we give an environment where agents can accomplish their work without 
their cooperation but they can do more efficiently with cooperation. 

X. Yao et al. (Eds.): SEAL’98, LNCS 1585, pp. 231- 123^ 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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Fig. 1. Field of the problem. 



Fig. 2. Deciding action by n-BDD. 



2 Baggage carriage problem 

In this problem, there are two kinds of agents defined later, a base and a pile of 
baggage. Each agent tries to find a pile of baggage, to take baggage there, and 
to find a base where the baggage should be put on. 

Agents work on a field which is shown in Fig. D] The field is a two-dimensional 
plane with a grid, and enclosed by a frame. Agents can not go out of this frame. 
There is a pile of baggage at the left of field and a base at the right. In the 
figure an agent is shown as a circle or a diamond to denote two different types 
of agents. The difference of the two types will be described in the next section. 
Shadowed agents show ones carrying baggage. 

We expect emergence of roles of agents. Some agents may work to lift up 
baggage and to hand to others. Some agents may receive baggage and carry to 
the base. Another may put down baggage received from the carrier to the base. 



3 Definition of agents in our problem 

Input and output of agents An agent is formalized as a machine that takes 
an input string and outputs an action by using a strategy which is evolved. The 
outline is shown in Fig. |2] Input strings reflect the environment surrounding the 
agent. On the field every time step an agent takes an input string, calculates its 
action and does it. An agent can see objects in their surrounding 13 x 13 square 
in the grid. These informations are coded in an input bit string: 

P = (AO, Al, A2, A3,X4,W5, A6). 

We call the square a visual scope of an agent. Each bit of Al, • • • , A6 in the bit 
string P for an agent will be one (1) when there is an object in the visual scope 
of the agent as described in Tabled] A bit will be zero (0) otherwise. AO gives 
information of self-status of an agent. It will be one (1) if the agent has baggage. 

An agent takes an action from the sixteen actions described in Table [2l The 
algorithm to execute each action is omitted because of the length of paper. 



Evolving Cooperative Actions Among Heterogeneous Agents 233 



Table 1. Meanings of each input bit. 



Input bits 


The case where the bit is 1 


AO 


Has baggage. 


XI 


A pile of baggage is in the visual scope. 


X2 


A base is in the visual scope. 


A3 


An agent of the same type is in the visual scope. 


A4 


An agent of the opposite type is in the visual scope. 


A5 


An agent of the same type with baggage is in the visual scope. 


A6 


An agent of the opposite type with baggage is in the visual scope. 



Table 2. Actions of agents. 



Actions / Descriptions 


Actions / Descriptions 


/ Approaches a pile of baggage. 

/ Goes away a pile of baggage. 

/ Approaches a base. 

/ Goes away a base. 

/ Approaches an agent of the same 
type. 

^ / Goes away from an agent of the 
same type. 

^ / Approaches an agent of the 
opposite type. 

/ Goes away from an agent of the 
opposite type. 


Q] / Takes baggage. 

Q] / Puts baggage. 

^ / Gets baggage from an agent of the 
same type. 

^ / Hands baggage to an agent of the 
same type. 

^ / Gets baggage from an agent of the 
opposite type. 

^ / Hands baggage to an agent of the 
opposite type. 

/ Walks at random. 

/X/ Does nothing. 



Table 3. Steps consumed for actions by agents. 





take (I) 


put (J) 


transport (A to H, K to P) 


Type-A (Truck) 


70 steps 


70 steps 


1 steps 


Type-B (Lift) 


1 step 


1 step 


7 steps 



Strategies to decide actions We used a data structure n-BDD to encode 
action strategies, which gives an action shown in Table [2| from an input bit string 
consisting of input bits described in Tabled Each agent has an n-BDD and obeys 
an action strategy encoded in it. This data structure gives an efficient calculation 
of actions and a good framework for evolution as explained in Appendix. Thick 
arrows in Fig. |2] shows an n-BDD giving the action B from an bit string. 



A group of agents To focus on cooperation, we evaluate work of agents by 
total points accomplished by all of agents in the field. Eight agents, including 
four type-A and four type-B agents, which are explained later, make a group 
and behave in the field. The points is also used as a fitness for evolution. 

We can also use another fitness, a point accomplished by only an individual 
agent. We do not take this idea, however, because the effectiveness of total points 
is investigated and evidenced in [S| as fitness of group. 
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4 Heterogeneous ability of agents 

To give heterogeneous ability of agents, we define forte actions and foible actions 
for each agent. A forte action is an action which an agent can take with con- 
sumption of only small amount of energy or with a short length of time. A foible 
action is an action which an agent can take with consumption of large energy or 
with a large length of time. 

We can formalize the heterogeneity of agents by giving sets of forte actions 
and foible actions. Then we define forte and foible actions of two different agents 
by time lengths for actions: 

Type-A agents (Truck agents) We give type-A agents characteristics like 
dump trucks. They are good at bringing baggage, although they are not 
good at lifting up and down baggage. So, actions to transport baggage (Ac- 
tions A to H and K to P with input bit 1 in AO) are forte actions. An action 
to take baggage (Action I) and an action to put on (Action J) are foible. 
Type-B agents (Lift agents) Type-B agents have characteristics of lifts. 
They have converse abilities to Type-A agents. So, actions to take (Action 
I) and to put (Action J) baggage are forte actions, and actions to transport 
(Action A to H and K to P with input bit 1 in XO) are foible. 

Table 0 gives time lengths (steps) consumed by the actions. 



5 Evolution with an Evolutionary Method 

To evolve cooperation among agents we use an evolutionary method, of which 
Fig. 0 shows an outline. In the algorithm fifty initial groups, each consisting 
of eight agents, are prepared by generating randomly. Every group is evaluated 
through execution on the simulator during 500 time steps. Each group is given 
the following value as its fitness. 

fitness = (the total number of pieces of baggage taken from the pile) 

-1-2 (the total number of pieces of baggage put on the base) 

To accomplish the goal to bring baggage to the base, the second term of the 
above expression is sufficient for fitness. However, we add the first term to have 
effective evolution at the initial stage of evolution. 

The groups evaluated are transfered to the next generation based on the elite 
strategy (Fig. 01. The ten best groups are transfered as is. The ten worst groups 
are thrown away. The other thirty with copies of the best ten, i.e. forty groups, 
are modified by genetic operations and make the next generations with the ten 
best groups. We used genetic operations with probability 0.5 for mutation, 0.25 
for insertion and deletion. The genetic operations are described in Appendix. 
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6 Experimental results 

In this section we show experimental results and give analyses of the results. 



Analysis of fitness of gronp Fig. shows the transition of fitness of the 
best group at each generation. The graph shows the average of the fitness of 50 
trials. The fitness of group increased with the passage of generations by agent’s 
learning. In special, we can see rapid increase of fitness around 1200th generation. 
This is observed in many cases and explained in the later. 



Analysis of ranges where agents act We divide the field into the four field 
ranges si, s2, s3 and s4 as shown in Fig. El to analyze the positions in the field 
where each agent is in large portion of their time. We can expect differences 
according to roles of agents in the problem and also to the types of agents. 

Fig. El shows histograms of agent’s positions during 500 step simulations 
through generations. Each histogram is for each agent. Trucks #1 to #4 and 
Lifts #1 to #4. A histogram consists of 15 small histograms for every 200th 
generation range and for the best group at the generation. The number of steps 
when an agent at the generation is in a field range is plotted in the histogram. 

From histograms we can observe that Lift agents are in the field ranges ad- 
jacent to the pile of baggage or to the base. On the other hand. Truck agents 
are in every range. This tendency becomes clear according as generations pass, 
especially after 1200th generation. The rapid increase of fitness at 1200th gen- 
eration may relate to this phenomena. We also observe that the four Lift agents 
branch to two groups. Lifts #1 and #2 are mostly in the field range adjacent to 
the pile of baggage and others are in the range adjacent to the base. 



Analysis of handing actions In Fig. [7] we analyze handing actions from an 
agent to another. We categorize handing actions by types of sender agents and 
receiver agents. The categories are described in Fig. 0 

Fig. 0 shows histograms of the numbers of actions happened in each category 
and in each field range. As according generations pass, only category ^2 handings 
are happened. They are happened only in field ranges si and s4. This observation 
is reasonable for expected cooperation. 

Fig. 0 shows u-BDDs of representative agents obtained. We can see that 
Truck agents does not have nodes for actions to take and put baggage, while 
Lifts have them. Nodes of Truck are for transport and handing actions and they 
are chosen by careful conditions. Lifts have fewer nodes and are specialized for 
their functions. The two different roles of Lifts are also seen in the difference in 
their structure. Lift #1 has node I and Lift ^ 4 : has J. 
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Sender 


Receiver 


Category #1 


Truck 


Truck 


Category #2 


Truck 


Lift 


Lift 


Truck 


Category #3 


Lift 


Lift 



200th to 3000th : generations, si to s4 : field ranges, 

0 to 30 : the times of handing actions in each categories in the field ranges. 
Fig. 7 . Histograms of handing actions by agents. 




(a) Truck #1 




(b) Lift 
(b) Lift #1 




(c) Lift #4 



Fig. 8. n-BDD structures of representative agents evolved. 



7 Conclusions 

Our observation is summarized: (1) Each type of agents works in a particular 
field range. (2) Handing actions are done between different types of agents, and 
(3) are done around the pile of baggage and the base. (4) Lift agents branches 
into two groups. A group works near the pile and the other works near the base. 
(5) The tendency of these observations increases according as generations pass. 

We conclude that each agent found their individual role, and each agent 
selects forte work for efficiency. “The right man in the right place” is emerged. 
The cooperation emerged includes two different type of branching of agent’s 
roles. First, the two different types of agents, Trucks and Lifts, became to play 
two different roles as shown in Fig. O Second, the Lift type agents branch to 
play their roles in the two different field ranges as shown in Fig. |B| and [71 
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Though this paper, we showed the possibility of emerging cooperative works 
in heterogeneous agents by using n-BDD and its genetic operations. Other evo- 
lutionary methods should also be compared in the future work. We characterized 
agent’s abilities by the consumption steps of actions. Investigation by changing 
the consumption steps also remains for the future work. 




(a) mutaion (b) insertion (c) deletion 

Fig. 9. Genetic operations for n-BDDs. 



Appendix n-BDD and genetic operations 

A BDD is a graphical notation of logical functions. A BDD has only two kinds 
of terminal nodes labeled true or false, but an n-BDD can have more than two 
labels and gives a value from any set of values. Genetic operations, mutation, 
insertion and deletion are defined to operate n-BDD as gene. 

Mutation changes a direction of an edge to a randomly selected node. The 
node must be subordinated by the node which the edge rises. Because of this 
restriction a loop and a cycle are never caused. Insertion inserts a new decision 
node on a randomly selected edge. Either of 0-edge or 1-edge of the new deci- 
sion node is randomly selected to point to the node which pointed before. The 
other edge becomes to point to a subordinate node randomly selected. Deletion 
deletes a randomly selected decision node. The edge pointing to the deleted node 
becomes to point to one of the nodes which pointed by deleted node before. 
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Abstract. This paper proposes a multi-agent system that carries out 
cooperative works. To achieve such works. Fuzzy Associative Memory 
Organizing Unit Systems (FAMOUS), Chaotic FAMOUS (CFAMOUS), 
and Conceptual Fuzzy Sets (CFS) are employed. With the proposed sys- 
tem, each agent robot can decide its own behavior for the situation in its 
environment. We apply this system for a Welfare Agent Robot System 
and carry out simulations. 



1 Introduction 

Recently, the population of old people has been increasing while at the same 
time the number of people nursing the elderly has been decreasing. This actual 
problem will perhaps require the robots to nurse the elderly in place of humans. 
From such a viewpoint, we are trying to achieve cooperative works with a wel- 
fare agent robot system (i.e., a walking support robot system), where robots 
cooperate with humans and also with other robots. 

Accordingly, we propose a multi-agent algorithm; each robot decides its own 
movement to create a certain formation (line style or circle style). We construct 
the system using the same robot control used in part FAMOUS, CFAMOUS, 
and CFS. This control part drives both (1) and (2) below. 

(1) A robot determines its own movement by following the general instructions of 
a human and the situation in its environment, without detailed instructions 
from a human. 

(2) When the robot cannot cope with a problem by using existing knowledge 
without the instructions of a human, the robot creates new knowledge to 
avoid trouble. 

Experimental results have shown the effectiveness of the welfare agent robot 
system. 

X. Yao et al. (Eds.): SEAL’98, LNCS 1585, pp. 240- [250l 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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2 Multi-agent Systems for Welfare Agent Robots 

In this paper, we regard a welfare agent robot as a walking support robot, such 
that the robot is equipped with a stick and walks with humans. Figure 1 shows 
an image of a welfare agent robot system. 

Each agent robot works cooperatively with other agent robots and with hu- 
mans. The agent robots brain consists of (a)a local feedback block, (b)a micro- 
scopic knowledge block, and (c)a macroscopic knowledge block. 

(a) Local feedback block: Analyzes information from a camera and sends transfer 
orders to the robot. 

(b) Microscopic knowledge block: Determines the goal coordinates where the 
robot will move to. 

(c) Macroscopic knowledge block: Judges whether the configuration of the robot 
is right or not. 

We construct the robot control block by using the Russmussen model, which is 
familiar as an efficient model for constructing an intelligent model. We construct 
the control block of the welfare agent robot by using FAMOUS, CFAMOUS, and 
CFS (mentioned in Sections 2.1 and 2.2). 

The agent robots have standard formations (mentioned in Section 2.3). By 
defining these, the agent robots are able to do cooperative works in various 
scenes. 

2.1 FAMOUS and CFAMOUS 

FAMOUS [1] achieves fuzzy associative inference by constructing fuzzy knowledge 
on an associative memory. Figure 2 illustrates this system, which uses fuzzy 
rules as fuzzy knowledge. FAMOUS adopts BAM (Bidirectional Asso 

□ 




Fig. 1. Image of a welfare agent robot system. 
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If-laycr Rule-layer Then-layer 




Fig. 2. Construction and fuzzy rules of FAMOUS 



dative Memory) [2], giving it bidirectional retrieval capabilities. Therefore, the 
system can retrieve the most appropriate pattern to fit the input conditions, by 
bottom-up and top-down processing conveying active values. (A regular associa- 
tive matrix is used for the BAM associative matrix.) Furthermore, the system 
can extract know^dge from active values in post reflection movements, because 
it has familiar, clear-cut knowledge expression capabilities. 

The associative inference performed on FAMOUS features the following. 

1) Bidirectional conversion is possible between oral expressions and feature- 
value pattern-like images. 

2) Macroscopic elements can be analyzed into microscopic elements. 

CFAMOUS adopts chaotic retrieval in the process of retrieval of FAMOUS. 
For our part, we use a chaotic abrupt descending method for chaotic retrieval. 
This method allows chaotic occurrences in the minimal energy area, by applying 
a periodically variable non-lineal resistance, in order to scatter the dynamics 
formula for movement on the energy curved surface of the neural network. CFA- 
MOUS presents the following two functions. 

1) From among stored patterns, the system dynamically retrieves patterns 
within a range close to the input patterns. 

2) If no patterns are stored, the system retrieves effective new patterns in terms 
of meaning. 

The second function means that multiple patterns retrieved chaotically from 
patterns not stored may include possibly effective patterns in terms of mean- 
ing. In other words, CFAMOUS can evaluate the effectiveness of such retrieved 
patterns, since knowledge is clearly expressed on each node of the network with 
meaningful fuzzy labels. 
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Fig. 3. Robot formations. 



These functions enable CFAMOUS to create new knowledge based on existing 
knowledge stored in the associative memory. Therefore, creativity (support) is 
possible by adopting CFAMOUS. 

2.2 CFS 

Fuzzy sets provide strong notations for representing real world concepts, which 
are essentially vague. However, they do have problems caused by the restrictions 
of numerical membership functions, restrictions of logical expressions, lack of 
context dependency, etc. These problems relate to the representation of the 
meaning of a concept. 

In this paper, we propose Conceptual Fuzzy Sets (CFS)0, fuzzy sets of a 
new type that conform to Wittgenstein’s ideas on the meanings of concepts. A 
CFS is achieved as an associative memory, combining a long-term memory and 
a short-term memory, thereby reducing the complexity of knowledge representa- 
tion. In addition to solving the above problems, CFS provides a simple formula 
for knowledge representation and a procedure for using this knowledge. We in- 
troduce an inductive method for constructing CFS based on neural network 
learning. The effectiveness of CFS and of the learning method are illustrated 
through their application to the recognition of facial expressions. 

CFS represents the meanings of concepts in multiple layers. The meaning 
of a concept is translated into an expression indicated by the distribution of 
activation in each layer. Propagations arise from the activation of the concept. 
In contrast, the activation of a lower concept determines the activation of an 
upper concept, it corresponds to recognition or understanding. 

2.3 Robot Formations 

As shown in Figure 3, we propose two robot formations. For the line style 
formation, all agent robots line up at regular intervals on a line, for example, 
two agent robots and a human in the center. 
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In the case of the circle style formation, all agent robots line up at regular 
intervals on a circle. When there are three agent robots, the three construct an 
equilateral triangle. With four, a square is constructed. 



3 Simulations and Robot Experiments 

3.1 Pre requisite 

The system consists of a tracking block, sensor data computing block, and robot 
control block. The tracking block is composed of a CCD camera, color tracker, 
and color extractor. This block computes the center of gravity of the robot based 
on color information, and assumes it to be the robot’s position. Then it outputs 
the coordinate values for a two-dimensional coordinate system. These coordinate 
values are transferred to the computer every l/60th of a second. Processing is 
done on a hardware basis. In addition, as shown in Figure 4, a visual field to 
the robot is assumed. 



3.2 Robot Control Block 

Figure 5 shows CFS representing an action of the welfare robot. The environ- 
mental information of a robot through the CCD camera or infrared ray sensor 
is analyzed. At the lower layer, other information, e.g., the positions of other 
robots and the unit number of all robots in the visual field are input, in order 
to determine the movement direction and migration length of the robot. 

At the upper layer, the inputs include 1) the difference from the closest robot 
and the distance from the farthest robot, and 2) the difference from the robot 
of the right side center piece and the angle of the robot of the left side center 
piece. The outputs here are the movement direction and the migration length 
adjustment to the result of the lower layer. Finally, the result of the lower layer 
and the result of the upper layer are integrated. 




Fig. 4. Robot sense of distance. 
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Fig. 5. CFS representing a welfare robot 



3.3 Simulation Experiments 

To verify the effectiveness of the control rule used in this paper, we experimented 
with two cases, i.e., only using the lower control rule of CFS and using CFS. 
Table 1 shows the results of a circle style formation experiment and a line style 
formation experiment. In both experiments, the left side of the table used CFS 
and the right side used only the lower control rule. 

Table 1 Circle Style Formation and Line Style Formation. 



Circle style formation 


Line style formation 


CFS 


lower rule 


CFS 


lower rule 


Ave 


Sim 


Ave 


Sim 


Ave 


Sim 


Ave 


Sim 


36 
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45 
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30 


3 


34 


4 


41 


13 


32 


11 


38 
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43 
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42 
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50 


18 


41 
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44 
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39 
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33 


24 
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35 
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27 
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27 
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39 
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42 
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18 
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18 
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57 
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56 
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36 
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40 
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20 
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17 
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30 
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34 
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59 


31 


87 


12 


39 


4 


43 


5 


27 


3 


28 


8 


29 


3 


34 


4 


33.4 


7.6 


36.8 


8.9 


35.3 


3.9 


38.2 


4.6 



Ave: average of each robot’s migration length 
Sim: simulation time 

In Figure 6, first, three robots are placed at random positions. Then, a 
human sends an order to them to arrange themselves in a single row. All of 
the robots get the same order. The human, however, does not issue any small 
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Fig. 6. Simulation result of line style formation with three robots. 

indication for each robot. These robots judge where to go by themselves and 
then move. Finally, they reach for the correct position independently. 

In Figure 7, first, four robots are placed at random positions. When they 
get an order from a human to arrange themselves circularly, they reach for the 
four corners. 

From the unit number of the robots placed first, when a difference emerges 
from an increase in the number, then the formation of the robots changes. As 
shown in Figure 8, four robots are placed first and later one robot is added. 
Because of the addition, the formation of the robots changes from a square to a 
positive pentagon. 

3.4 Robot Experiments 

We apply knowledge obtained by simulation to real robot experiments. In the 
experiments, we use two robots and create formations of the two robots and one 




Fig. 7. Simulation result of circle style formation with four robots. 
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Fig. 8. Simulation result of circle style formation with four robots and five robots. 




Fig. 9. Experimental result of line style formation with a person and two robots. 




Fig. 10. Experimental result of circle style formation with a person and two robots. 
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person. In Figure 9, the person is in the center of a field (of length Im by width 
Im), and the two robots arrange themselves at both sides to make a line style 
formation. 

In Figure 10, the person is in the center of a field (of length Im by width 
Im), and the two robots move to make a circle style formation. 



4 Formation Movements using Chaotic Evolutionary 
Computation 

It is necessary to work cooperatively to maintain a formation style. For example, 
when two persons walk together, they generally walk side-by-side to maintain 
their formation. When turning, the outside person walks faster, while the inside 
person slows down. 

We apply these concepts to two robots moving in parallel. Each of the robots 
obtains knowledge based on chaotic retrieval using the proposed method. 

In the case of formation movements, the robot control block is switched to an- 
other control block, i.e., one representing formation movements. Two robots will 
have the same control block using CFS. The low-level layer maintains such fun- 
damental action rules as going forward and turning left or right. The high-level 
layer maintains steering and speed control rules and another rule to maintain a 
constant distance from the other agent when both are moving together. Chaotic 
retrieval is applied to the steering and speed control of this high-level layer. 
Then, it becomes possible to obtain rules on how to move together in parallel, 
by conceiving and adjusting the steering degree and moving speed. Figures 11 
and 12 show two real robots moving side-by-side while maintaining a regular 
interval. 




Fig. 11. Simulation locus of two robots moving side-by-side. 
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Fig. 12. Experimental result of Figure 11. 



5 Effectiveness of Soft DNA 

In a system in which a large number of agents work, it is good for all agents to 
share roles. To produce cooperative works, we have a proposal of a new method 
(under discussion). 

We call this new method ’’Soft DNA (Soft computing oriented Data driven 
fuNctional scheduling Architecture) |^” . Soft DNA aims to imitate the idea of the 
developmental process, such as the body plans in actual life based on biological 
DNA (DeoxyriboNucleic Acid). 

In biological DNA, the genes called ’’Homeo box genes” dynamically control 
the body development of an individual in actual life based on the concentrations 
of proteins in cells. The control architecture called ’’soft DNA” dynamically 
controls the development of intelligence in each agent based on environmental 
information, in order to achieve dynamic cooperation. Biological DNA has sets 
of genes that are related to each body part such as the head, chest, abdomen, 
and tail. These sets of genes are each called a homeo box. 

Similarly, soft DNA has boxes of intelligence (made by soft computing, i.e., 
associative memories, neural networks, fuzzy logic, chaos, and so on) that are 
related to various environments, a suitable box of intelligence is developed ac- 
cording to the environmental information available as shown in Figure 13. 

Soft DNA has boxes characterized as a set of roles, e.g., a search robot, 
transportation robot, or construction robot. 

All agent robots have the same soft DNA and can switch their own roles 
dynamically. 
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Fig. 13. Biological DNA and Soft DNA 



We are trying to apply soft DNA to a multi-agent robot system. As an ex- 
ample, when it was applied to an intelligent transport system (ITS), the average 
of the vehicular gap error improved about 1/4. 

6 Conclusions 

This paper proposed a multi-agent system that carries out cooperative works. 
To achieve cooperative works, we proposed fuzzy sets of a new type named 
Conceptual Fuzzy Sets (CFS). By using FAMOUS and CFS, each agent robot has 
become able to determine its own behavior for the situation in its environment. 
We applied this system to a welfare agent robot system and showed the usefulness 
of CFS. In addition, we showed the possibility of soft DNA as a new method. 
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Abstract. This paper proposes a chaotic evolutionary computation al- 
gorithm instead of conventional GA (Genetic Algorithm) for such intel- 
ligent agents as welfare robots which assist humans. This evolutionary 
computation is realized by applying chaotic retrieval and Soft DNA(Soft 
computing oriented Data driven fuNctional scheduling Architecture) on 
associative memories. We apply this evolutionary computation to multi- 
agent robots which move abreast and ITS (Intelligent Transport System). 
Essentially, the process of this evolutionary computation is parallel pro- 
cessing. Therefore, we implement its parallel processing algorithm on 
A-NET (Actors NETwork) parallel object-oriented computer, and show 
the usefulness of parallel processing for proposed evolutionary computa- 
tion. 



1 Introduction 

Recently, evolutionary computation models on Alife (Artificial life) have been re- 
searched by computerp[]. Nowadays, its typical approach method is GA (Genetic 
Algorithm). Gonventional GA is an algorithm based on traditional Darwinism. 
On the other hand, in recent years, new theories of evolution except Darwinism 
have been advocated. Nakahara et al. have advocated virus theory of evolution, 
and explain rapid evolution which can not be explained by mutation and natural 
selection|j^. Yomo et al. have advocated evolution based on competitive coexis- 
tence, and argue that evolution is not simple optimization because of interaction 
among lifep]. In any case, it is certain that evolution of actual life is not such 
simple processes as conventional GA. Above all, we think there are not only ge- 
netic factors but also other factors (e.g., cultural factor) in evolutionary process 
of brain or its intelligence. In addition, it is said that evolution is irreversible 
process which does not enable the life to become again the exactly same life as 
it used to be. It seems to us that there is chaos in this complexity of evolution. 

Therefore, we propose evolutionary computation of intelligence by chaotic dy- 
namics and Soft DNA (Soft computing oriented Data driven fuNctional schedul- 
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ing Architecture) as shown in FiglTl4l. We explain this evolutionary computation 
and Soft DNA in the following chapter. 




Fig. 1. Evolutionary computation of intelligence by chaotic dynamics 




Fig. 2. Welfare intelligent agents and intuition-based agent model 



On the one hand, in the society which is filled with old people, the welfare 
agent robots which assist the old or the sick people are requested as shown 
in Figj21 The welfare robots have to move in a suitable formation, in coopera- 
tion with other agents, humans and the outer environment. We have to acquire 
the knowledge of cooperative work efficiently. Therefore, we apply the proposed 
evolutionary computation to the multi-agent robots which move abreast as an 
example of cooperative work in welfare robots and we also apply this evolu- 
tionary computation to ITS (Intelligent Transport System). The brain of this 



Evolutionary Computation for Intelligent Agents 253 



welfare robot is constructed by the intuition-based agent model as shown in 
Figj^ This agent model consists of hierarchical fuzzy knowledge that uses asso- 
ciative memories 1^, and it imitates human creativities in order to adapt itself 
to its environmental changes, conceiving new ideas based on current knowledge 
by chaotic retrieval. Each hierarchical part retrieves the knowledge based on 
fuzzy associative inference on associative memories. Essentially, this inference in 
each part is parallel processing, and these hierarchical parts also work in par- 
allel. Furthermore, a large number of agents work in parallel on a multi-agent 
model and its evolutionary computation. Therefore, we undertook parallel pro- 
cessing according to these parallel properties in the brain and in nature. We 
implement a parallel processing algorithm on A-NET (Actors NET work) parallel 
object-oriented computer [H], and show its usefulness. 

2 Soft DNA and its Evolutionary Computation 
2.1 Soft DNA 

We propose a new method for development of intelligence in order to realize dy- 
namic cooperation in the intelligent agents. We call this new method ’’Soft DNA 
(Soft computing oriented Data driven fuNctional scheduling Architecture)” . Soft 
DNA aims to imitate the idea of the developmental process, such as the body 
plans in actual life based on biological DNA(DeoxyriboNucleic Acid). FigOshows 
the image of soft DNA compared with biological DNA. In biological DNA, the 
genes called ’’Homeo box genes” dynamically control the body development of 
an individual in actual life based on the concentrations of proteins in cells. The 
control architecture called ’’soft DNA” dynamically controls the development 
of intelligence in each agent based on environmental information, in order to 
achieve dynamic cooperation. Biological DNA has sets of genes that are related 
to each body part such as the head, chest, abdomen, and tail. These sets of 
genes are each called a homeo box. Similarly, soft DNA has boxes of intelligence 
(made by soft computing, i.e., associative memories, neural networks, fuzzy logic. 
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Fig. 3. Image of soft DNA 
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chaos, and so on) that are related to various environments, and a suitable box 
of intelligence is developed according to the environmental information. 



2.2 Evolutionary Computation on Soft DNA 

The proposed soft DNA consists of some boxes which are made by associative 
memory system named FAMOUS(Fuzzy Associative Memory Organizing Units 
System) We use CFAMOUS(Chaotic FAMOUS) to carry out evolutionary 
computation on soft DNA. We simulate the proposed evolutionary computation 
in parallel on A-NET parallel computer which is explained in the next chapter. 




Fig. 4. Fuzzy associative memory organizing units system 



FAMOUS (Fig0|) represents fuzzy knowledge using several BAMs (Bi-di- 
rectional Associative Memories) [2|. This system performs fuzzy associative in- 
ference that causes an input pattern to approach the nearest pattern using top- 
down and bottom-up processing (i.e., network reverberation). This propagates 
the activation values of each node. CFAMOUS applies the chaotic retrieval to 
the retrieval process of the memorized patterns in FAMOUS. The chaotic steep- 
est descent method (CSD method) is used as chaotic retrieval method. This 
method chaotically itinerates among the local minimums in the energy function 
of the neural network. CFAMOUS has two functions. First, memorized pat- 
terns near the external input pattern are dynamically retrieved and this range 
is restricted by one parameter which defines the degree of system nonlinearity. 
Second, non-memorized and valid patterns can be retrieved as well as memorized 
ones. We explain the proposed evolutionary computation at the 6th chapter in 
detail. 

3 The A-NET parallel computer 

Baba et al. have been proceeding with development and research of an A-NET 
(Actors NETwork) parallel computer[S]. A-NET has a parallel object-oriented 
total architecture, and allows users to describe parallel programs naturally by 
using A-NETL (A-NET Language). The node processor on this computer con- 
sists of a processing element(PE), a local memory, and a router, and optional 
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network topologies have been provided. A-NETL(A-NET Language) is a paral- 
lel object-oriented language which describes parallel programs naturally. A unit 
of parallel processing on A-NETL is an object. An object consists of data and 
procedure (method). On A-NETL, each object cooperatively sends or receives 
messages and processes them in parallel. 

4 Multi-agent robots which move abreast 

We will explain multi-agent robots which move abreast as a fundamental example 
of cooperative work in welfare robots. When two people walk together, they 
generally keep in step with one another. When turning, the outside person walks 
faster, while the inside person slows down. We have applied this conception 
to two robots which move abreast as shown in Fig0 In this model, a robot 
understands its situation, i.e., where the robot is, and it then carries out chaotic 
retrieval to adapt itself to its situation. 




Fig. 5. Multi-agent robots which move abreast 



.3) Estimation network. 



estimation rule 




Fig. 6. Robot control block 



The robot control block is shown in FigEl This control block is constructed 
based on the proposed agent model shown in Figl2] The upper network of this 
figure has the knowledge to adapt itself to the change in its situation. The 
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Fig. 7. Lower rule and upper network using CFS (Conceptual Fuzzy Set) 



output of the lower rule depends on this knowledge. The robots estimate their 
movement in the estimation network. These networks are realized by associative 
memory network based on FAMOUS or CFAMOUS. More concretely, the lower 
rule and the upper network are constructed by CFS (Conceptual Fuzzy Set)j2] 
as shown in Fig0 This CFS is constructed by hierarchical network applying 
(C)FAMOUS. The bottom of this figure shows the lower rule. The top shows 
the upper network. In the lower rule, the left is the numerical value input and the 
right is the numerical value output. The outputs are speed and steering value. In 
this application, a box of soft DNA is made by the CFS shown in FigJz] New soft 
DNA is created by chaotic retrieval on this CFS network. Each intelligence as 
box of soft DNA can evolve separately according to environmental information 
such as whether the mobile agent is inside or outside at turning abreast. In order 
to acquire suitable knowledge(i.e., good soft DNA) in the upper network, we use 
the proposed evolutionary computation. We create new generation of various 
agent robots with new knowledge by applying chaotic retrieval in the upper 
network. 



5 ITS (Intelligent Transport Systems) 

The ITS are the systems which realize safer and more efficient traffic and trans- 
portation by constructing the intelligent automobiles and road environment. For 
example, the realization of platooning of automobiles has been researched for 
that purpose. The platooning means platoon (i.e., group) running of automo- 
biles. In platooning, if all platoons or all automobiles run based on the exactly 
same control knowledge, the traffic system may be a failure as a whole. Therefore, 
it seems that each platoon or each automobile needs to have the fluctuations of 
its intelligence in order to do the different movement from the others. If we con- 
sider a platoon to be a group of automobile agents, we can apply the proposed 
evolutionary computation to this platooning. The basic control block is realized 
by CFS as well as the case of multi-agent robots which move abreast. 

In ITS platooning, it seems that desirable intelligence of a automobile should 
be change according to whether the automobile runs as a head vehicle or as a 
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middle vehicle or as a tail vehicle in the platoon. The situation of each automobile 
changes dynamically as the current middle vehicle changes to the head vehicle 
because of division of the platoon. Therefore, we apply evolutionary computation 
on soft DNA to each ITS automobile and aim to realize the dynamic development 
of intelligence. 

6 Simulation results and discussion 

6.1 Multi-agent robot which move abreast 

First, we will explain the results of evolutionary computation based on chaotic 
retrieval and soft DNA in multi-agent robots which move abreast. Second, we 
will explain the results of parallel processing on agent robots. Fig[H| shows an 
example of the evolutionary process. The situation of this figure shows the case 
where the agents are given the ’’Turn Left” instruction. First, two agent robots 
can’t move abreast because they move based on the same control value. The 
evolutionary computation creates new generation of various agents with new 
knowledge (i.e., the box containing new soft DNA) by applying chaotic retrieval 
in the agent’s brain (i.e., associative memories). The generation consists of the 
group of the pairs of two agents which intend to move abreast. These pairs 
are kept or killed based on the estimation about their movement. Finally, the 
agent with the suitable knowledge(i.e., good soft DNA) to move abreast is kept. 
Evolutionary computation is performed in the same way for all situations (i.e., 
for all instructions). 



Fig. 8. An example of the evolutionary process 



Table 1. The number of pairs of created agents in each instruction before the agent 
with suitable knowledge is acquired. (The case of parallel algorithm of evolutionary 
computation) 



Instruction 


Right & 
Large 


Turn 

Right 


Right & 
Small 


Left& 

Large 


Turn 

Left 


Left& 

Small 


The number of pairs of 
created agents 


26 


5 


10 


27 


6 


9 



Table[T]shows a result in the case of parallel algorithm of evolutionary compu- 
tation, that is, it shows the number of pairs of created agents in each instruction 
before the agent with suitable knowledge is acquired. Each knowledge item is 
acquired by creating the about 14 pairs of agents on average. In this parallel evo- 
lutionary process, each knowledge is created separately, that is, plural pairs of 
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Table 2. The number of pairs of created agents in each instruction before the agent 
with suitable knowledge is acquired. (The case of serial algorithm of evolutionary com- 
putation : Total number is 67) 



Instruction 


Right & 
Large 


Turn 

Right 


Right & 
Small 


Left* 

Large 


Turn 

Left 


Left& 

Small 


The number of pairs of 
created agents 


26 


9 


2 


22 


5 


3 



agents are created and perform evolutionary computation concurrently. Finally, 
all knowledges are integrated on the associative memory network of CFAMOUS. 
This integration of knowledge was realized by utilizing evolutionary computation 
using CFAMOUS. It is difficult for conventional neural networks to realize this 
evolutionary parallel computation. 

After agent robots had acquired suitable knowledge for all instructions, we 
verified their movement by means of computer simulation. The results are shown 
in Fig0 As shown, the robots move abreast suitably when a series of instructions 
is given to the robots. 



Fig. 9. The movement of robots (Instructions: Turn Left — > Right & Small — > Right 
& Large) 



Table E] shows a result in the case of serial algorithm of evolutionary compu- 
tation, that is, it shows the number of pairs of created agents in each instruction 
before the agent with suitable knowledge is acquired. In this serial evolutionary 
process, one pair of agents repeats evolutionary computation serially to acquire 
all suitable knowledges. 

In parallel algorithm, the maximum number of created pairs is 27 at the 
’’Left & Large” instruction as shown in Table [H The processing time of parallel 
algorithm is directly proportional to this number (27). In serial algorithm, the 
total number of created pairs through all situations is 67 from Table El The 
processing time of serial algorithm is directly proportional to this total number 
(67). Therefore, the parallel algorithm of evolutionary computation is about 2.5 
times faster than its serial algorithm from the viewpoint of the number of pairs 
of created agents. 

We also realized parallel program which simulates the associative inference 
and the movement of robots in multi-agent robots which move abreast in par- 
allel, and implemented it on A-NET parallel computer. The simulation results 
show that the parallel program is about 11 times faster than conventional serial 
program. 
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6.2 ITS(Intelligent Transport Systems) 

We apply the proposed evolutionary computation on soft DNA to the platooning 
of ITS (Intelligent Transport System). In this application, each box of soft DNA 
is made by the CFS. and each box develops the intelligence for each role such 
as the roles of a head vehicle, a middle vehicle and a tail vehicle in the platoon 
according to environmental information. We show the soft DNA is useful for 
the realization of dynamic cooperation by simulations. Furthermore, we show 
the proposed evolutionary computation improves the control performance about 
distance between vehicles in the ITS platooning. Simulation result showed that 
the average error between the ideal distance and the actual one improved about 
1/4 by the evolved soft DNA. 

7 Conclusion 

This paper proposed an evolutionary computation algorithm by chaotic dynam- 
ics and Soft DNA instead of conventional GA (Genetic Algorithm) for such 
intelligent agents as welfare robots which assist humans. This evolutionary com- 
putation was realized by applying chaotic retrieval on associative memories. We 
applied this evolutionary computation to multi-agent robots which move abreast. 
Essentially, the process of this evolutionary computation is parallel processing. 
Therefore, we implemented its parallel processing algorithm on A-NET (Actors 
Network) parallel object-oriented computer, and showed the usefulness of paral- 
lel processing for proposed evolutionary computation. 
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Abstract. In this paper, we propose new approximate clustering algo- 
rithm that improves the precision of a top-down clustering. Top-down 
clustering is proposed to improve the clustering speed by Iwayama et al, 
where the cluster tree is generated by sampling some documents, mak- 
ing a cluster from these, assigning other documents to the nearest node 
and if the number of assigned documents is large, continuing sampling 
and clustering from top to down. To improve precision of the top-down 
clustering method, we propose selecting documents by applying a GA to 
decide a quasi-optimum layer and using a MDL criteria for evaluating 
the layer structure of a cluster tree. 

Keywords: Document Retrieval, Beysian Clustering, Minimum Descrip- 
tion Length Criteria, Genetic Algorithm 



1 Introduction 

Recently, Document retrieval based on similarity is becoming a new active re- 
search area. Iwayama et al. proposed a hierarchical clustering method for docu- 
ment retrieval based on similarity. They call the algorithm Hierarchical Bayesian 
Clustering (HBC)[T]. When the number of documents is N, the required time for 
a simple exhaustive search method is 0(N). When a prearranged cluster is used, 
required time is O(logA^). However the calculation amount to make a cluster 
is O(A^), therefore it is extremely difficult to make a cluster of a large num- 
ber of documents by conventional systems. Then they proposed an approximate 
clustering technique for applying HBC to large number of document sets. The 
basic idea of the approximation is to decrease processing time in deciding a layer 
by computing the similarity from selected documents instead of all documents. 
However, in Iwayama’s proposed method, the layer is not always optimum be- 
cause documents are selected at random. 

We propose to select documents by applying a genetic algorithm (referred to 
hereafter as GA) [2] to deciding a quasi-optimum layer and using a MDL criteria 
for evaluating the layer structure of a cluster tree. Our method gives better 
accuracy than Iwayama’s method, because the layer structure of a cluster tree 
constructed by our method is quasi-optimum. The advantage of the GA based 
algorithm is that it is known to converge speed compared with other optimal 
methods. 
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In this paper, we report speed comparison results between our method and 
a strict clustering technique, and also report evaluation results of precision com- 
pared to Iwayama’s method. 



HBC uses evaluation parameters which take account of word appearance fre- 
quency. It is known that compared to a general clustering method such as that 
of Ward0, the clustering precision of HBC is higher[lj. Here, the detailed clus- 
tering algorithm of HBC and Iwayama’s approximate clustering algorithm will 
be described. 

2.1 HBC Algorithm 

In this method, the measurement of nearness is posterior probability P{ci\dtest), 
the probability that the test document dtest is classified into a cluster c^. “T=t” 
means that a randomly selected term T from the document dtest is equal to t. 



Probabilities on the right-hand side of this equation are estimated as follows: 

— P(T = t\dtest)' relative frequency of a term t in a test document dtest- 

— P(T = t\ci): relative frequency of a term t in a cluster c^. 

— P(T = t): relative frequency of a term t in the entire set of training docu- 
ments. 

— P{ci)\ relative frequency of documents that belong to Ci in the entire set of 
training documents. 

HBC constructs a cluster hierarchy (also called dendrogram) from bottom to top 
by merging two clusters at a time. At the beginning, each document belongs to 
a cluster whose only member is the document itself. For every pair of clusters, 
HBC calculates the increase of posterior probability after merging the pairs and 
selects the pair that results in the maximum increase, and those clusters are 
merged to form a new cluster. 

To see the details of this merge process, consider a merge step k-|-l(0 < k < 
N — 1). By the step k-l-1, a data collection of N data D = {di, ^2, d^Vr,} has 
been partitioned into a set of clusters Ck = {ci,C2,...}. That is, each datum 
di G D belongs to a cluster cj G Ck- The overall posterior probability at this 
point becomes 



2 Strict Clustering by HBC 




t 



P{T = t\c,)P{T = t\dtest) 
P{T = t) 



( 1 ) 



p{Ck\D) = n n 



( 2 ) 



Cj Ck di Cj 

The set of clusters Ck is updated as follows: 



Ck-\-l Ck { Cx , Cy } T { Cx C Cy } 



(3) 
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After the merge, the posterior probability is inductively updated as follows: 



P{Ck+i\D) = P{Ck\D) 



ridi Cx Cy U Cy\di) 

ridi Cx fid; Cy 



(4) 



When clustering is performed with the above algorithm, the number of cal- 
culations of evaluation values of the posterior probability is: 



N-2 

nC 2 + k = {N - If = 0{N^) (5) 

fc=i 

Hence when a large number of documents is handled, cluster generation be- 
comes more difficult as the number of documents increases. 



2.2 Iwayama’s Approximate Clustering Method 

Iwayama et al proposed approximate clustering techniques called “top-down 
clustering”, where the cluster tree is generated from top to down[^. The algo- 
rithm of top-down clustering is, 

1. Selects 5'-documents from document set \D\, randomly. 

2. Classifies S'-documents with strict clustering, and make a seed cluster. 

3. Assigns |D| — S' non-selected documents to the nearest leaf node of the seed 
cluster tree. 

4. If the size of document set assigned to a single leaf is reasonable to treat with 
a strict clustering method, construct a tree, otherwise, continue to select 
documents and cluster that document set. 

The clustering time becomes twice faster than that of strict clustering method. 
However, in Iwayama’s proposed method, the layer is not always optimum be- 
cause documents are selected at random. 

3 Proposed Clustering Method 

Now, we shall discuss the technique that we propose, an evaluation function for 
finding an optimum document set by this method, and the GA technique we 
use. 



3.1 Algorithm of proposed method 

To improve the precision of Iwayama’s clustering method, we propose the follow- 
ing method, combining conventional strict clustering with top-down clustering 
using GA. Assume that the total number of documents to be clustered is N, 
and the number of documents within a range which can be handled by a strict 
clustering method is M. 
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procedure GA-clustering() 

all documents are assigned to a root document set {Droot)', 

Droot is registered as cue-Q; 
while (Q is not empty) { 

a document set Dp at the head of Q is extracted; 
if (the number of documents Dp < M) 

HBC(-Dp); /* clustering of Dp by HBC */ 
else { 

Dd = Select(Dp); 

/* document set constructed from M documents considered to be optimum, 
which are extracted from Dp. 

The coding lengths of cluster is minimized based on an MDL criteria fsee l3.^ . 

A genetic algorithm is used for the analytical search lse d3.3^ .* / 

HBC(Dd); 

The remaining document sets {Dp — Dd) are assigned to the nearest 
leaf {Li £ Cd)-, 

Document sets assigned to the Di = Lt\ 
if(number of documents Di > 0) 

Di is added to Q\ 

} 

} 

endproc 

3.2 Evaluation function for finding an optimum document set 

To extract document set Dd including M documents considered to be optimum 
from a document set D, we make a cluster using that document set and use a 
coding length of a cluster tree as a basis. Herein, as the MDL criteria is used, 
a document set which minimizes the coding length of a tree is considered to be 
optimum. 

The coding length (L) is calculated as the sum of the description length (Li) 
required to write the tree, and the description length (L 2 ) for the probabilistic 
distribution of documents assigned to leaf nodes. 

The coding of a tree follows the universal coding principle!^. The tree is 
queried in a pre-ordered manner[3 and it outputs 1 when an internal node is 
encountered, and outputs 0 when a leaf node is encountered. In this case, if the 
number of leaf nodes is k, the number of internal nodes is fc — 1, and the coding 
length Li becomes 2k — 1. 

The documents for evaluation are assigned to the nearest leaf nodes. If the 
number of documents assigned to a leaf node i is rij, and the probability that 
a document assigned to rii is selected from all assigned documents is Pi, the 
description length L 2 is given by equation 

k k 

D = ~y^n, logPi = - Y] Tii log ' (6) 

As the number of documents for evaluating the tree (referred to hereafter as 
R) is extracted at random from D — Dd, R is independent of D or Dd- 
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3.3 GA for finding optimum document sets 

Regarding the ploblem of finding an optimum document set, there is no method 
to define suitable initial value for the next search. GA is suitable for this problem, 
because GA performe multi-point sampling in parallel. Thus, we apply GA to 
find a cluster having a minimal coding length. 

In this algorithm, a gene represents a document in the target documents, 
and a value of 1 means selected for evaluation whereas 0 means not selected. 
A genotype denotes a combination of selected documents. The whole number 
of genes in a genotype, therefore, is equal to the number of target documents, 
and the number of genes having a value of 1 is equal to that of the selected 
documents. 

Here, the following model is used. 

— Scaling: power scaling (/ = /^) 

— Selection crossing: Fitness proportional strategy and elite storage strategy 

— Crossing and Mutation: In this method, instead of pairing two parents, a 
given proportion of hits{Crossing) of parents of which the number is equal 
to a generation gap, is replaced by random bits. 

— Generation model: Gontinuous generation model 

If the number of document sets is N and the number of extracted documents 
is M, the merge frequency in the GA-clustering of 13.1 1 is as follows. The merge 
frequency of strict clustering of M extracted document sets is (M — 1)^. Merge 
frequency per document to assign to a leaf node is 2o! log M where a is a number 
depending on the degree of balance of the tree, and is in the vicinity of 1 when 
the balance is good. The frequency of GA-clustering is j3N/M where /3(< 1) also 
depends on the balance of the tree. The number of genes to be evaluated per GA- 
clustering is Npg + NpgRg{Ng ~ l) where Ng, Npg, Rg are respectively number 
of generations, number of genes per generation and generation gap. From the 
above, it is seen that the total frequency of merges is approximately: 

((M - 1)2 + 2Ra log{M)){Npg + NpgRg{Ng - l))/3^ = 0{N) (7) 

4 Estimation of Clustering Speed and Precision 

Here, we shall show how much the proposed method improves clustering speed 
and precision. 

4.1 Experimental Environment and Measurement Parameters 

Sun UltraEnterprise 450 (Solaris 2.6, 512MB) is used. We used patent data all of 
which have a reasonably large document length. We used 21 patents for search 
input, and had a professional organization conduct a search for similar patents. 
We used other patents which we sampled at random from the same period, and 
all of the patents found by this organization as a target document set. 
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4.2 Experimental results: speed 

We fixed the parameters relating to GA as follows, and examined the relation 
between number of documents and processing speed. 



M (number of documents extracted) 

R (number of documents assigned to 
leaf nodes) 

Ng (number of generations) 

Npg (number of genes per generation) 

Rg (generation gap) 

Crossing (number of bits to be changed) 



16 

128 

10 

10 

10 

3 



Table [T| shows measured results for processing speed for HBC which is strict 
clustering and the proposed method when the number of documents is varied. 
It is seen that almost all of the processing time is taken up by merging clusters. 



Table 1. Processing Time 



Number of documents 


50 


100 


250 


500 


1000 


2500 


HBC 


Merge part 


10.47 


43.16 


303.97 


1,172.64 


4,378.84 


25,044.56 


(sec) 


Total 


39.45 


72.88 


380.27 


1,572.07 


7,509.49 


72,306.67 


Proposed 


Merge part 


604.04 


3,183.04 


6,230.81 


7,150.97 


13,896.17 


52,608.81 


method(sec) 


Total 


1,747.34 


4,555.79 


7,929.94 


11,692.47 


28,267.38 


92,279.07 



Fig. [T] shows the merge frequency of clusters and the time required to merge 
clusters for different numbers of documents. “Strict clustering” refers to HBC, 
“GA based clustering” refers to the proposed method, the horizontal axis shows 
numbers of documents and the vertical axis shows merge frequency and merge 
time. From this it is found that whereas according to the conventional method 
the merge frequency increases in direct proportion to N'^, in the proposed method 
it is directly proportional to N up to N = 250. 

When M is small compared to N, there is a larger proportion of clustering 
using GA. Hence in the proposed method when N = 500, there is a sharp rise 
in the number of gene evaluations and the merge frequency appears to sharply 
increase. However it is expected that although the merge frequency increases 
overall when the value of M is increased, the change-over point will be slightly 
later. 

4.3 Experimental results: precision 

To determine the search precision, we found the Recall/Precision considering the 
patents cited by the professional organization to be correct answers. The search 
covered 250 patents including all the patents which were correct answers found 
by inputting the above 21 patents. 
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— e — strict clustering 




— e — strict clustering 


- - GA based clustering 




- - GA based clustering 




Fig. 1. Merge Frequency and Merge Time 



The experimental results are shown in Fig. [21 “Exhaustive” is the result of a 
exhaustive search without using a cluster, “top-down” is the result of a search 
corresponding to Iwayama’s top-down approximation algorithm when clustering 
was performed by extracting M documents at random using GA, “GA based” 
is the result when clustering was performed with 16 generations and 50 genes. 
As a result, the proposed method achieves a higher precision than a top-down 
approximation search. 



Precision 




Fig. 2. Measured results of clustering precision 
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5 Conclusions 

In this paper, we measured clustering performance using the proposed method. 
It was found that according to the proposed method, the number of merges of 
clusters was reduced from O(iV^) to 0{N), and the time required for one merge 
could be maintained substantially constant regardless of N. 

Also, we determined precision by Recall/Precision to verify that there was 
no deterioration of precision due to increased speed. As a result, it was found 
that the precision of the proposed method is higher than that of “top-down 
approximation clustering” which is a type of approximation clustering proposed 
by Iwayama et al, and that GA functioned effectively. 
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Abstract. This paper describes investigations into using evolutionary 
search for quantitative spectroscopy. Given the spectrum (intensity x 
frequency) of a sample of material of interest, we would like to be able to 
infer the make-up of the material in terms of percentages by mass of its 
constituent compounds. The problem is usually tackled using regression 
methods. This approach can have various difficulties. We have cast the 
problem as an optimisation task. Using a hybrid of distributed genetic 
algorithm with a local search around the best individual of the popula- 
tion, very good results have been found, even with noise, for a number of 
different instances of the problem, with variations in the range between 
6 and 16 constituent compounds. The stochastic optimisation approach 
shows great promise in overcoming many of the problems associated with 
the more standard regression techniques. 

Keywords: Genetic algorithm, quantitative spectroscopy. 



1 Introduction: The Problem 

Given the spectrum (intensity x frequency) of a sample of material of interest, we 
would like to be able to infer the make-up of the material in terms of percentages 
by mass of its constituent compounds. This is illustrated in Figured] 






source material compound 1 compound 2 compound 3 

Fig. 1. The problem. Given the spectra, what is the make-up of the source material in 
terms of the percentages by mass of its constituent compounds? 
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This important problem in quantitative spectroscopy, occurring widely in 
medicine and the chemical industries, is officially referred to as quantitation. A 
useful route into solving the problem comes in the form of the Beer-Lambert 
Law I1I3I . Simply stated, it claims that when a sample is placed in the beam of a 
spectrometer, there is a direct and linear relationship between the amount (con- 
centration) of its constituent (s) and the absorbance of the sample (the amount 
of energy it absorbs). It follows directly from this that the source material’s 
spectrum is a linear combination of its constituents’ spectra. This law forms the 
basis of nearly all chemometric methods for spectroscopic data. If the constituent 
compounds’ spectra are known, then we can cast the problem as an optimal line 
fitting exercise of a special kind. Essentially the constituents’ spectra are used 
as basis functions. The problem is to find the coefficients, a^, in the following 
equation (the spectra have all been normalised to equate to the same reference 
concentration): source spectrum = J2i=i ^ spectrumi , where 0 < < 1. 

There are several different versions of this problem. The simplest is where the 
set of N constituent compounds is known. The problem is to find the proportions 
of each in the source material. In the second version of the problem the number 
of constituent compounds is not known. However, the set of compounds from 
which the constituents might be drawn is known. In the third and hardest version 
of the problem neither the number of constituents or the full set from which they 
might be drawn is known. In all version of the problem spectrum noise must be 
taken into account. We will concentrate on the first version of the problem for 
the rest of this paper. 

The standard way of tackling the problem is to use one of the data fitting 
techniques based on regression f 17161 1. Although varying in complexity and scope, 
there is not one single method that is undeniably the best across all possibili- 
ties of spectral analysis. All in all, the more sophisticated the method, the more 
mathematical and complicated it becomes, often introducing mathematical ar- 
tifacts that may compromise the quality of solution, and the speed of execution. 

A typical problem that all these methods can produce unreliable solutions 
for, will involve a fairly large number of constituent compounds (> 15), with ex- 
tensively overlapping spectra, where many of the compounds make up a similarly 
small fraction of the whole (about 1%) — typically there will be singularities in 
the analytic solutions [8]. 

However, the quantitation problem can be alternatively cast as an optimisa- 
tion problem. Namely, find the set of a^s that minimises the difference between 
the source spectrum and that composed of the constituent spectra multiplied by 
their respective a^s. In general, the search space will be very large and complex, 
so simple gradient methods are unlikely to do well. 

This paper describes our investigations into using stochastic search (hybrid 
GA/local search) to tackle the problem. As far as we know, this is the first time 
the quantitation problem has been treated as an optimisation task. Evolutionary 
search has been used on data fitting problems before (e.g. 0 ), but not on this 
kind of task with complex non-analytic basis functions and strong constraints. 
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The evolutionary computation approach being developed is intended as an al- 
ternative approach to quantitative spectroscopy, one that avoids the drawbacks 
of the standard methods, while bringing the additional promise of being able to 
address problems that affect even the current best methods, for instance their 
very weak ability to handle samples with constituents not present in the original 
calibration mixtures. Evolutionary computation seems to be a natural way to 
tackle the latter problem, with its ability to handle variable-length genotypes 
that would represent the concentrations of a variable number of constituents. 

It would be desirable to compare the evolutionary method with current tech- 
niques for this problem. However, since much of the software is proprietary and 
linked to special apparatus, this comparison was left as the next stage of the re- 
search. The aim of the present work is to establish whether or not evolutionary 
computation can be used at all. If it can find accurate solutions to hard instances 
of the problem, it must be considered a serious new candidate for this important 
class of problems. 

2 The Encoding 

A solution to the problem as outlined in the previous section is clearly a set of 
UiS, the proportion by mass of the constituents of the source material. A direct 
encoding for use with a GA could just be a string of IV real numbers representing 
the ttiS. However, the o^s are all interrelated by the constraint = 1? which 

just reflects the obvious fact that the sum of all proportions by mass must equal 
unity. This constraint would invalidate nearly all solutions created by simple 
versions of GA operators, such as crossover and mutation, acting on a direct 
encoding. In the direct encoding we effectively have maximum epistasis. This 
means that either an indirect encoding must be found or appropriate genetic 
operators must be developed. 

For this initial study a simple indirect encoding was devised for use with 
simple cheap genetic operators. In this encoding a genotype is a string of real 

numbers: genotype = xi X2 X3 xn where, 0 < Xi < 100 (any upper 

limit could be used, 100 having been chosen for convenience). This is the only 
constraint on the values of Xi. Figure [2] illustrates how this is decoded into a 
candidate solution. 

First the XiS are mapped onto the real interval [0, 100]. The constituent 
fractions, the a^s, are then allocated as the sub-interval to the right of each Xi 
position on the line interval. The rightmost Xi is treated differently; the sub- 
interval to the its right is calculated by wrapping round to the leftmost Xi; in 
this way it is guaranteed that the constraint mentioned above is obeyed. 

This encoding was found to work well. A normalised encoding, where the 
ttiS were calculated from N XiS, directly encoded on the genotype as reals in the 
range [0,100], by dividing each Xi by the sum of all XiS on the genotype, was 
found to work well - but not as well as the encoding presented here. 
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Fig. 2. Decoding the genotype. First the XiS are mapped onto the real interval [0,100], 
then the a^s are given by the sub-interval to the right of XiS position on the line. The 
full interval is treated as circular with the end wrapping round to the beginning. 



3 Cost Function 

Let us assume that a spectrum S is represented by a discrete set of m points, 
as follows: S = {si, S 2 , S3 , ... , s^} • Hence, both the spectra of target materials 
and of the constituent compounds are stored as sets of m points; in the study 
presented below synthetic data was used, generated as sets of m = 500 points. 
A candidate solution spectrum, S, is built up from the set of a^s through the 
relation S = ^ji} > * = 1, 2, 3, . . . , to , where {s^i, Sj 2 , Sj^,. 

) = Sj is the spectrum of the jth constituent compound. The cost of a solution 
is just the squared difference between the target spectrum (S) and the candidate 

spectrum (5), that is: C{S) = ~ ^i)^- Minimisation of this function 

will provide a set of UiS that can account for the observed spectrum. Whether 
or not it is the right set will depend on whether or not there is a many to one 
mapping to sets of a^s to spectra. Investigating this question was an important 
part of this research and it shall be discussed again in Section O 

4 Implementation 

The problem was tackled with the encoding and cost functions as described 
above, using a distributed GA with local selection similar to that detailed in [2] , 
with a population of size 225 distributed over a 15x15 toroidal grid. 

Recalling that the genotypes are just strings of real numbers (that is, geno- 
type = Xi X 2 X 3 Xn), simple one point crossover was used with a probability 

of 0.9. Three forms of mutation were used. Type 1: a gene, Xj, was mutated by 
adding to it a random number from a uniform distribution over the range [-10.0, 
10.0]. The probability of a gene mutating was set at 1.0/N, where N is the num- 
ber of genes. Type 2: if a gene was to be mutated there was a 1 in 10 chance that 
the random number added to it was taken from a uniform distribution over the 
range [-40.0, 40.0]. If a mutation event of either of these two kinds caused a gene 
value to exceed 100.0 or become negative, its value was calculated by treating 
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0.0 



^xl x4 x7 

H 



x5 x2< x6 
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100.0 



rl 



r2 



Fig. 3. Coupled mutation. The set of XiS making up the genotype are ordered. Two 
random points in this set, rl and r2, are chosen. The whole of the section of the set 
between rl and r2 is moved to the right/left (50% chance) by some random fraction of 
the subinterval immediately to the right /left (ir/il). 



the range of possible values [0.0, 100.0] as a circular interval as described earlier. 
Referring to Figure El the effect of this kind of mutation is to slide the relevant 
gene up or down the line interval. This means that a single mutation is likely to 
only affect a small number (usually 2) of a^s given the decoding scheme used. 
Type 3 is a little more complicated and shall be referred to as coupled mutation. 
Its workings are illustrated in Figure El It is a problem specific operator that 
proved to be very powerful. 

In order to achieve the operation illustrated in the diagram, each of the XiS 
between the random points rl and r2 are each increased or decreased by the 
same amount. This means that the subintervals in this section of the ordered 
set are not altered. Hence, the a^s specified by this part of the ordered set are 
unchanged. Only the intervals above and below the specified section are altered; 
one increases by exactly the same amount the other decreases. This has the 
effect on the solution of randomly choosing two constituent compounds and 
increasing the proportion of one while decreasing the proportion of the other 
by the same amount. Although type 1 and type 2 mutations can also have this 
effect it is less likely that every possible ‘coupling’ will be ‘tweaked’ in this way. 
Type 3 mutation was applied with a probability of 0.2 per genotype. Coupled 
mutation was also used as the operator at the heart of the local search algorithm 
incorporated into the GA. Heuristically motivated problem specific encodings 
and operators are becoming more and more popular in GA application |^, and 
can often produce significant improvements in performance. After every pseudo- 
generation a simple local search algorithm - using the coupled mutation operator 
- was run (200 times) on the fittest individual in the population. If a better 
solution was found using this algorithm, it replaced the original best solution 
and the GA continued running. The addition of this local search was found to 
be quite effective, speeding up the search, particularly in the early stages. Run 
by itself, it produced fairly poor results. 
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5 Results 

A number of synthetic problems were randomly generated by means of random 
reference constituent spectra. The number, height and width of peaks were cho- 
sen randomly (but setting the value of the highest possible peak) . The proportion 
of the constituents in the target compound were then randomly generated and 
the resulting linear sum of spectra (the target material spectrum) was created. 
In this way a large number of test problems were created with varying numbers 
of constituent compounds and also with varying amounts of noise introduced 
into the data. In all of the problems the number of constituents were known, as 
well as the set of constituents. 

Using this setup, very good results were consistently found. The problems 
used between 6 and 16 constituent compounds. Note that a problem size of 16 is 
considered to be very large in this area. The distributed GA incorporating local 
search on the best individual was able to consistently find solutions with almost 
zero cost in terms of spectrum difference - these always turned out to be correct 
to two or three sig. figs, on all the compound percentages. 

A number of baseline experiments were carried out on the problems detailed 
below. Random search was applied to each. Not surprisingly, it only ever found 
very poor solutions. Local gradient descent was also tried. The coupled mutation 
operator described earlier was exhaustively run in small steps on all pairs of OiS 
in a solution. This created the set of nearest neighbours. The best of these was 
moved to until no further progress was made. This was much better than random 
search and often found fairly good solutions. However, they were never down in 
the very low regions the GA was able to reach. Lastly, running the GA without 
the local search generally resulted in good solutions. However, they were not as 
reliably excellent as with the local search in combination with the GA. 

No noise. The first set of experiments involved problems of size 6, 8, 12 and 
16 where there was no noise on the spectral data. Results are shown in the first 
half of Table [T] A single random problem was generated for each size and 25 
runs of 300 generations were performed on each of these. The left-hand columns 
of the table show the cost of the worst, average and best solutions found. This 
cost is just the spectra difference measure used in the cost function detailed 
earlier. The right-hand columns show the corresponding differences between the 
set of constituent percentages in the target and those given by the solution. This 
difference (referred to as composition error) , is the root mean square difference 
between the vector of a^s given in the solution (a,) and the actual percentages 

(a^), that is: Error = 

We can see that both sets of errors are reduced to very low values. The 
good news is that reducing the spectra difference (the only thing available from 
observable data) does result in solutions that give the constituent percentages to 
a very high level of accuracy (2 or 3 sig. figs.). The important thing to know is 
that once the composition error goes below about 0.08, the solutions are almost 
exactly correct; above this level some of the percentages are not quite right. There 
is a very strong correlation between where the solution costs (spectra difference) 
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Number 


25 runs on 1 random problem 


1 run on 


25 random problems 


of 


Spectrum error 


Composition error 


Spectrum error 


Composition error 


Components 


Worst 


Avg. 


Best 


Worst 


Avg. 


Best 


Worst 


Avg. 


Best 


Worst 


Avg. 


Best 


6 


5.3 


2.6 


1.0 


0.08 


0.04 


0.02 


5.5 


2.5 


1.0 


0.1 


0.05 


0.02 


8 


7.2 


3.2 


1.0 


0.09 


0.04 


0.02 


6.8 


3.2 


0.0 


0.09 


0.05 


0.02 


12 


10.2 


7.3 


1.0 


0.13 


0.06 


0.04 


11.4 


7.8 


1.0 


0.17 


0.08 


0.03 


16 


12.5 


7.4 


1.0 


0.18 


0.07 


0.04 


13.9 


7.4 


1.0 


0.19 


0.07 


0.04 



Table 1. In the first half of the table, a different random problem, of the appropriate 
size, was generated for each row; the results given are taken from 25 runs on the single 
problem, and refer to the best solution found in a single run. In the second half of the 
table, for each row, 25 random problems were generated, of the appropriate size; the 
results given are taken from a single run of each of the 25 problems, and refer to the 
best solution found in a single run. Each run lasts 300 generations. 



go below about 12.0 and where the accuracy of the constituent percentages 
becomes extremely high. If this had not been the case the method would not be 
applicable. Results from the first half of Table [T|show that the method is reliable 
- every single run gave a very accurate solution. 

The second half of Table [U gives results averaged over single runs of 25 
problems each for the four different sizes. Again the results are very good. These 
sets include very difficult problems such as 16 constituent compounds where 
most are only present in amounts between 0.5% and 1.0% and a few dominate 
the mixture with much higher percentages (something like 35% or 50%). Even in 
these cases the evolutionary method finds solutions where all the percentages are 
correct to a fraction of a percent of the true value. It is important to note that 
these results are far more accurate than standard methods are able to achieve. 

Noise. The second set of experiments involved adding noise to the spectral 
data, making them unreliable. Amounts of 2% and 5% noise were added to 
the problem from the first half of Table [H Every time a spectrum was used 
in an evaluation its points were randomly moved within these limits giving a 
slightly different version every time. Each evaluation was repeated 5 times and 
the average cost was used in the selection algorithm. As can be seen in Table |2] 
the results are a little worse than with no noise, but not much. The important 
thing is that the best solutions are still accurate in terms of the percentages of 
the constituents. The fact that this method inherently uses the whole spectrum, 
helps in averaging out the effects of noise. This is a very promising result as all 
real data is noisy. 



6 Conclusions 

This paper has described initial investigations into casting the spectroscopy 
quantitation problem as an optimisation task and tackling it with stochastic 
search. Results are very promising. The method can find very accurate solutions 
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Number 


2% Noise 


5% Noise 


of 


Spectrum error 


Composition error 


Spectrum error 


Composition error 


Components 


Worst 


Avg. 


Best 


Worst 


Avg. 


Best 


Worst 


Avg. 


Best 


Worst 


Avg. 


Best 


6 


7.4 


4.6 


1.7 


0.1 


0.06 


0.03 


10.1 


5.5 


1.9 


0.3 


0.1 


0.06 


8 


9.2 


5.1 


1.0 


0.13 


0.05 


0.03 


11.2 


6.2 


1.3 


0.18 


0.12 


0.06 


12 


11.3 


7.4 


1.0 


0.15 


0.08 


0.05 


13.3 


8.4 


2.0 


0.21 


0.18 


0.06 


16 


12.8 


7.7 


1.0 


0.19 


0.08 


0.04 


13.8 


8.6 


2.0 


0.22 


0.16 


0.05 



Table 2. These results are for the same problem as in the first half of Table\^ but with 
2% and 5% noise added. See text for further details. 



to large hard noisy problems, and appears to be general. It is conceptually sim- 
pler than the standard methods used and does not appear to suffer from the 
problems that plague many of these methods (unreliable solutions for certain 
classes of problem, or solutions that are difficult to interpret). 

As already mentioned, an alternative normalised encoding was also tried. 
However, empirical evidence showed that the results were a little worse with this 
alternative decoding scheme. Importantly, very good results were only achieved 
for both encodings as long as a coupled mutation operator was used. Hence, the 
heuristically motivated coupled mutation appears to be a key issue in the success 
of the approach described. Future work will tackle problems where the number 
of constituents are not known in advance. 

ACK.: This work was supported by grants from FAPESP (96/7200-8), The 
British Council (SPA/126/881) and CNPq (300.465/95-5). We thank P.Bargo 
for conversations on spectroscopy and M.T. Pacheco for suggesting the problem. 
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Abstract. This paper proposes a method based on evolutionary compu- 
tation for recognizing features of CAD data. Feature-based chromosome 
scheme is developed in which its locus corresponds to two features of 
CAD data provided by using the Boundary Representation method. The 
efficiency of the proposed method is shown through experimental results. 

Keywords. CALS (Continuous Acquisition and Lifecycle Support), Evo- 
lutionary Computation, CAD (Computer Aided Design) , Boundary Rep- 
resentation Method 



1 Introduction 

Recently, a lot of companies integrate CAD/CAM system to enhance the product 
quality and productivity, and to reduce overall product life cycle cost. However, 
practical CAD data are usually very large and complex, so integration of a 
CAD/CAM system to a practical production system is very difficult due to a 
difficulty to store such large actual CAD data in a computer system. 

To eliminate the difficulty, possible common data in all the CAD data are 
shared. Sharing common data among the large number of CAD data can reduce 
amount of whole the CAD data. The common data to be shared can be obtained 
by recognizing features of the CAD data, i.e., the common data are the features 
themselves. Such a feature recognition problem can be stated as a combinatorial 
optimization problem, which must consider huge number of combinations [T]. 

In this paper, we propose a method to recognize features from CAD data 
using Evolutionary Computation (EC) for sharing CAD information In 

the proposed method, we employ the Boundary Representation method to de- 
velop feature-based representation scheme. This scheme is based on two kinds 
of information derived from CAD data by using the Boundary Representation 
method, and is developed as a suitable one for recognizing the features of CAD 
data. Furthermore, the efficiency of the proposed method is demonstrated using 
some simplified CAD data. 

X. Yao et al. (Eds.): SEAL’98, LNCS 1585, pp. 276- |284l 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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2 Boundary Representation of Solid Model 

The solid model shown in Figure 1 can be illustrated as Table 1 by using the 
Boundary Representation method [4]-[6]. The values of Face Type (FT) and 
Edge Type (ET) in Table 1 are given by Table 2. Neighboring Face Loop (NFL) 
represents the relationship between the face and neighboring faces. 



h4 F9 




Fig. 1. Solid model 

Table 1. Illustration of solid model using Boundary Representation 



Face 


FT 


Ue 


NFL (ET) 


FS 


FI 


1 


4 


F7(l) - F8(l) - F9(l) - F2(l) 


1.0 


F2 


1 


4 


F7(l) - Fl(l) - F9(l) - F3(l) 


1.5 


F3 


1 


4 


F7(l) - F2(l) - F9(l) - F4(l) 


2.0 


F4 


1 


4 


F7(l) - F4(l) - F9(l) - F5(l) 


1.5 


F5 


1 


4 


F7(l) - F5(l) - F9(l) - F6(l) 


1.0 


F6 


1 


4 


F7(l) - F5(l) - F9(l) - FlO(l) 


1.0 


F7 


1 


8 


Fl(l) - F8(l) - FlO(l)- F6(l) - 
F5(l) - F4(l) - F3(l) - F2(l) 


1.0 


F8 


1 


4 


F7(l) - Fl(l) - F9(l) - FlO(l) 


1.0 


F9 


1 


8 


Fl(l) - F2(l) - F3(l) - F4(l) - 
F5(l) - F6(l) - FlO(l)- F8(l) 


1.0 


FIO 


1 


4 


F7(l) - F5(l) - F9(l) - F5(l) 


1.0 



Table 2. Types of Face and Edge 



Face 


Type 


Edge 


Type 


Plane 


1 


Straight 


1 


Cylinder 


2 


Ellipse 


2 


Cone 


3 


Circle 


3 


Torus 


4 






Sphere 


5 
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Face Score (FS) is used to measure the face complexity based upon the con- 
vexity or concavity of the solid model. Face Score can be calculated by equation 
(1), where Average Edge Score (AES) is given by equation (2), and Angle Score 
(AS) is the score of edge for its angle (see Figures 2 and 3). 



FS = 12 * (ET * AS) -h AES 

Aes,E(ET.AS) 



Ue 



( 1 ) 

(2) 



where is the number of edges. 




Angle 




Smooth Edge 



Fig. 2. Concept of Edge Score 



Fig. 3. Illustration of Angle Score 



3 Evolutionary Computation Approach 

3.1 Chromosome Representation and Initialization 

When comparing the target model with a reference model, we must consider all 
combinations of faces between the target and reference models. In this paper, to 
deal with such numerous combinations, we use the random key representation 
for encoding [7| . The random key representation encodes a solution with random 
numbers. 

The allele tackles a random decimal number from an open interval (0,1). 
Here, the allele is only used as keys for sorting in descending order. The locus 
corresponds to two features: FS of each face and the Average Face Score (AFS) 
of NFL (see equation (3)). 



^ X) FS of Neighboring Face 
The Number of Neighboring Face 

For example, the solid model shown in Figure 1 can be coded into the feature- 
based chromosome shown in Figure 4. In Figure 4, the reference list can be 
obtained by sorting the keys of these features in descending order. The reference 
list is compared with every reference model stored in database on both FS and 
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FS 

AFS 

chromosome 



reference 

list 



FI F2 F3 F4 FS F6 F7 F8 F9 FIO 
1.000 1.500 2.000 1.500 1.000 1.000 1.000 1.000 1.000 1.000 
1.125 1.125 1.125 1.125 1.125 1.000 1.125 1.000 1.125 1.000 
[0.51 0.25 0.47 0.98 0.72 0.09 0.86 0.36 0.18 0.74] 



I 



sort in descending order 



[ F4 F7 FIO F5 FI F2 F8 F2 F9 F6 ] 



0.98 0.86 0.74 0.72 0.51 0.47 0.36 0.25 0.18 0.09 



Fig. 4. Chromosome representation 



Pi I 0.51 0.25 0.47 0.98 0.72 0.09 0.86 0J6 0.18 0.74 | 

P2 10.48 0.56 0.98 0.57 0J8 0J2 0.11 0.64 0.81 0.41 | 

randonly generated 1 (^.32) 

0. 51 XO. 32+0. 48 X (1-0. 32)=0. 49 

o, I 0.49 0.96 0.82 0.70 0.49 0.25 0J5 0.55 0.61 0.52 ] 

Fig. 5. An example of crossover operation 



AFS of NFL, and then one of the reference models which matches the most can 
be obtained. 

The number of genes is as many as the number of faces. Various chromosomes 
can prepare various orders for comparing the target model with a reference 
model. 

In the initialization phase, popsize (the population size) chromosomes are 
generated randomly as the initial population. 

3.2 Crossover 

The arithmetical crossover is employed as crossover operator. The arithmeti- 
cal crossover is defined as a convex combination of two vector. If the constraint 
set is convex, this operation ensures that children are feasible if both parents are 
feasible. For a pair of parents pi and P2, the crossover operator can produce two 
offspring oi and 02 as follows: 

01 = A X Pi -I- (1 — A) X p2 

02 = A X p2 + (1 — A) X Pi 

where A is randomly generated from (0,1). An example is shown in Figure 5. 
Note that we choose two chromosomes, which have same length, as the parents. 

Changing pi and p2 each other, the other offspring 02 can be obtained by 
applying the same manner. 
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3.3 Mutation 

Mutation is designed to perform the swapping mutation 0, *-e-, it selects two 
genes randomly in a chromosome and exchanges their positions. An example is 
shown in Figure 6. 



p 



0 




[ 0.51 0.86 0.47 0.98 0.72 0.09 0.25 0.36 0.18 0.74 ] 

^ I 



Fig. 6. An example of mutation operation 



3.4 Evaluation and Selection 

A chromosome (a target solid model) is evaluated by comparing its FS and AFS 
lists with ones of a reference solid model. If the length of two lists (the number of 
faces) of the target solid model is different from one of the reference solid model, 
add dummy variables to the shorter FS and AFS lists to make the lengths of all 
the lists equal. An example of this adjustment is shown in Figure 7. 




FS list I 1,500 2.000 1.500 1.000 1.000 1.000 1.000 1.000 1.000 1.000 J 
AFS list [ • ■ • i i 




PS list I 1.250 1.167 1.250 1.167 I.OOO 1.167 1.000 1.000 99.00 99.00 | 
AFS list [ -■- A 

additional dummy variables I 



Fig. 7. Adjustment of the lengths of FS and AFS lists 



The evolution is controlled by the following evaluation function. 
Sfsi = \Rfsi - Tfs^l 
Safsi = |i?a/si - Tafsil 

np 

i=l 

( 1 ; > Sfsi and S 2 > Safsi 

Si = < 0 ;Both Sfsi and Safst are dummy variables, 
1 or (<5i < Sfs, or ^2 < Safs,) 

i = 1,2, . . . ,n_F 



( 4 ) 

( 5 ) 

( 6 ) 
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where Rfsi is the FS of face i for reference solid models, T fsi is the FS of face 
i for target solid models, Rafsi is the AFS of face i in NFL for reference solid 
models, Tafsi is the AFS of face i in NFL for target solid model, A is the total 
agreement degree, np is the number of faces, and 62 are threshold values. 
The total agreement dewee A is used as the fitness of a chromosome, and the 
roulette wheel approaeh is employed at the selection phase. 

4 Numerical Experiments 

4.1 Experiment 1 

In order to verify the basic ability of the proposed method, we tested our method 
by recognizing a target solid model given in Figure 8 among the reference models 
shown in Figure 9. There are 6 reference models, and all of them have the same 
number of faces as the target model {np = 10). 

The parameters for genetic algorithm, which are the experimentally best 
tuned values, are set as: population size popsize = 5, maximum generation 
maxgen = 1000, crossover rate Pc = 0.2, mutation rate Pm = 0.7, and the 
threshold values <5i = 0.1, 62 = 0.15. To verify the robustness of the proposed 
method, we used such a small population size (=5), even though it restricts the 
performance of the method. 

The result is summarized in Table 3, where the result is obtained by averaging 
results among 100 runs. In Table 3, a reference model D was recognized as the 
fittest one with high frequency in 100 runs. Table 3 also includes the results 
obtained by using EC employing only FS in the evaluation phase. 



4.2 Experiment 2 

We tested our method using more complex problem, i.e., recognizing a target 
solid model among the 16 reference models given in Figure 10. 

The evolutionary parameters were set as follows: population size pop size = 
20, maximum generation maxgen = 5000, crossover rate Pc = 0.5, mutation 
rate Pm = 0.7, and the threshold values = 0.1,52 = 0.15. We prepared 16 
solid models (A~P) shown in Figure 10. One solid model among the 16 models 




Fig. 8. Target solid model 
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ABC 




D E F 



Fig. 9. Reference models 




Fig. 10. 16 solid models for the Numerical Experiment 



was used as the target model, and the remaining 15 models were used as the 
reference models. Every A~P models were used as the target. 

The results were summarized in Table 4, where each result was obtained 
as the average among 100 runs. This table shows that the proposed EC-based 
method could recognize the appropriate model(s) with high probability in most 
cases. 

5 Conclusions 

In this paper, we proposed the new feature recognition method of CAD data us- 
ing EC. And we demonstrated the effectiveness of the proposed method through 
an experimental result. 

The result showed a good ability of our method for feature recognition. For 
feature study, we will improve the proposed method to deal with more complex 
solid models. 
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Table 3. Results of recognition (1) 





1 Using FS and AFS | 


1 Using only FS 


Result 


Frequency 


Average 

Fitness 


Average 

Generation 


Frequency 


Average 

Fitness 


Average 

Generation 


B 


24/100 


7.00 


946.79 


- 


- 


- 


D 


76/100 


8.00 


27.25 


100/100 


10.00 


44.56 



Table 4. Results of recognition (2) 



Target 


Result 


Frequency 


Average 


Average 


(«f) 


{np) 




Fitness 


Generation 


A(10) 


B(10) 


96/100 


8.00 


413.906 




H(9) 


4/100 


8.00 


94.000 


B(10) 


P(12) 


56/100 


8.00 


201.964 




A(10) 


23/100 


8.00 


461.783 




G(10) 


19/100 


8.00 


300.579 




J(8) 


2/100 


7.00 


22.500 


C(10) 


none 








D(10) 


H(9) 


100/100 


7.00 


44.280 


E(10) 


1(8) 


100/100 


7.00 


17.220 


F(10) 


M(9) 


100/100 


6.00 


51.280 


G(10) 


P(12) 


100/100 


10.00 


353.190 


H(9) 


K(8) 


53/100 


7.00 


25.566 




D(10) 


42/100 


7.00 


42.500 




G(10) 


3/100 


7.00 


21.330 




A(10) 


1/100 


7.00 


5.000 




L(8) 


1/100 


7.00 


10.000 


1(8) 


K(8) 


100/100 


8.00 


75.160 


J(8) 


K(8) 


88/100 


7.00 


37.318 




B(10) 


12/100 


7.00 


27.583 


K(8) 


1(8) 


100/100 


8.00 


72.740 


L(8) 


G(10) 


60/100 


7.00 


277.217 




P(12) 


23/100 


7.00 


287.348 




F(10) 


17/100 


7.00 


154.000 


M(9) 


F(10) 


100/100 


6.00 


28.310 


N(ll) 


none 








0(11) 


none 








P(12) 


G(10) 


100/100 


10.00 


134.900 
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Abstract. Four different prisoner’s dilemma and associated games were 
studied by running a round robin as well as a population tournament, 
using 15 different strategies. The results were analyzed in terms of def- 
initions of generous, even-matched, and greedy strategies. In the round 
robin, prisoner’s dilemma favored greedy strategies. Chicken game and 
coordinate game were favoring generous strategies, and compromise di- 
lemma the even-matched strategy Anti Tit-for-Tat. These results were 
not surprising because all strategies used were fully dependent on the 
mutual encounters, not the actual payoff values of the game. A popu- 
lation tournament is a zero-sum game balancing generous and greedy 
strategies. When strategies disappear, the population will form a new 
balance between the remaining strategies. A winning strategy in a pop- 
ulation tournament has to do well against itself because there will be 
numerous copies of that strategy. A winning strategy must also be good 
at resisting invasion from other competing strategies. These restrictions 
make it natural to look for winning strategies among originally generous 
or even-matched strategies. For three of the games, this was found true, 
with original generous strategies being most successful. The most diverg- 
ing result was that compromise dilemma, despite its close relationship to 
prisoner’s dilemma, had two greedy strategies almost entirely dominat- 
ing the population tournament. 

Keywords: games, simulations, evolutionary stable strategies 



1 Introduction 

In multi agent systems the concept of game theory is widely in use. There has 
been a lot of research in distributed negotiation m. market oriented program- 
ming m, autonomous agents m and, evolutionary game theory m m- 
The evolution of cooperative behavior among self-interested agents has re- 
ceived attention among researchers in political science, economics and evolution- 
ary biology. In these disciplines, it has been used from a social science point of 
view to explain observed cooperation, while in multi agent systems (MAS) it may 

X. Yao et al. (Eds.): SEAL’98, LNCS 1585, pp. 285- |2^ 1999. 
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be used to try to create systems with a predicted cooperative behavior. In section 
0we look at prisoner’s dilemma like games and the Tit-for-Tat (TfT) strategy. 

In evolutionary game theory the focus has been on evolutionary stable 
strategies (ESS). The agent exploits its knowledge about its own payoffs, but 
no background information or common knowledge is assumed. An evolutionary 
game repeats each move, or sequence of moves, without a memory. In many MAS, 
however, agents frequently use knowledge about other agents. We look at three 
different ways of describing ESSs and compare them to MAS. Firstly we treat the 
ESS as a Nash equilibrium of different strategies. A Nash equilibrium describes 
a set of chosen strategies where no agent unilaterally wishes to change its choice. 
In MAS, some knowledge about the other agents should be accessible when 
simulating the outcome of strategies. This knowledge (e.g., the payoff matrix of 
another agent, and the knowledge that it maximises its expected utility) makes 
it hard to predict the outcome of the actual conflict. Instead of having a single 
prediction we end up with allowing almost any strategy. This is a consequence 
of the so-called Folk Theorem (see, e.g., PD- 

A game can be modeled as a strategic or an extensive game. The former is a 
model of a situation in which each agent choose a plan of action once and for all, 
and all agents’ decisions are made simultaneously while the latter specifies the 
possible orders of events. All the agents in this paper use strategic strategies, 
which we classify as generous, even-matched, or greedy. In section!^ the outcomes 
for 15 different strategies are shown as an example of our classification. 

Secondly the ESS can be described as a collection of successful strategies, 
given a population of different strategies. An ESS is a strategy in the sense that if 
all the members of a population adopt it, then no mutant strategy can invade the 
population under the influence of natural selection. In an evolutionary context, 
we can therefore simply calculate how successful an agent will be. The problem 
is that this is not the same as finding a successful strategy in an iterated game 
because in this game the agents are supposed to know the history of the moves. 
Instead of finding the best one, we can try to find a possibly sub-optimal but 
robust strategy in a specific environment, and this strategy may eventually be 
an ESS. In sectionSa round robin tournament is held for prisoner’s dilemma like 
games to see what kind of strategy that will do best and population tournaments 
illustrate what succesful combinations there are. 

Thirdly the ESS can be seen as a collection of evolved successful strategies. 
It is possible to simulate a game through a process of two crucial steps: mutation 
(changes in the ways agents act) and selection (choice of the preferred strate- 
gies). Different kinds of evolutionary computations (see e.g., P, P) have been 
applied within the MAS society, but the similarities to biology are restricted01n 
section0we introduce noise and the agents become uncertain about the outcome 
of the game, even if they have complete knowledge about the context. 



^ Firstly, EC, use a fitness function instead of using dominating and recessive genes. 
Secondly, there is a crossover between parents instead of the meiotic crossover. 
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2 Prisoner’s dilemma like games 

Prisoner’s dilemma 1151 , m was originally formulated as a paradox (in the sense 
of that of Allais and Ellsberg) where the cooperatively preferable solution for 
both agents, low punishment, was not chosen. The reason is that the first agent 
did not know what the second agent intended to do, so he had to guard himself. 
The paradox lies in the fact that both agents had to accept a high penalty in 
spite of that cooperation is a better solution for both of them. 

The one-choice prisoner’s dilemma has one dominant strategy, play defect. If 
the game is iterated there will be other dominating strategies because the agents 
have background information about previous moves. The iterated prisoner’s 
dilemma (IPD) is generally viewed as the major game-theoretical paradigm for 
the evolution of cooperation based on reciprocity. 

When Axelrod and Hamilton 13’ 13 analyzed the iterated prisoner’s dilemma 
they found that the cooperating TfT strategy did very well against more defect- 
ing strategies. All agents are interested in maximizing individual utilities and are 
not pre-disposed to help each other. If an agent cooperates this is not because 
of an undirected altruism but because of a reciprocal altruism m favoring a 
selfish agent. The TfT strategy has become an informal guiding principle for 
reciprocal altruism IB’ El- 

Binmore 0 gives a critical review of the TfT strategy and of Axelrod’s sim- 
ulation. He concludes that TfT is only one of a very large number of equilibrium 
strategies and that TfT is not evolutionarily stable. On the other hand evolu- 
tionary pressures will tend to select equilibrium for the IPD in which the agents 
cooperate in the long run. In the next section we will look at an alternative 
interpretation. 

3 A simulation example 

In a simulation we used the proportions of (C,C), (C,D), (D,C) and (D,D) to 
analyze the successfulness of a strategy. We have developed a simulation tool in 
which we let 15 different strategies meet each other The different strategies are 
described in [Zj. 

The tournament was conducted in a round robin way so that each strat- 
egy was paired with each other strategy plus its own twin and a play random 
strategy. Each game in the tournament was played on average 100 times (ran- 
domly stopped) and repeated 5000 times. The outcomes are shown in figure H] 
below where the percentage of (C,C), (C,D), (D,C) and (D,D) for each strat- 
egy is shown. We will use the proportions of (C,C), (C,D), (D,C) and (D,D) 
as ’’fingerprints” for the strategy in the given environment, independent of the 
payoff matrix. For some of the strategies this is true without any doubts: Al- 
ways Cooperate {AllC) and Always Defect (AllD) have 100 per cent cooperate 
(C,C)-|-(C,D) and 100 per cent defect (D,C)-|-(D,D) respectively. It is possible to 
look at how the proportions of (C,D) compared to (D,C) form different groups 
of strategies. TfT begins with cooperate and then does the same move as the 



288 



S. Johansson, B. Carlsson, and M. Boman 




Fig. 1. Proportions of (C,C), (C,D), (D,C) and (D,D) for different strategies. 



other player did last time. This means that (C,D)«(D,C) for all payoff matrices 
so the actual values do not matter. It is possible to treat the other strategies the 
same way because none of them reflect upon their actual payoff value. We will 
instead describe the strategies as generous, even-matched or greedy. 

1. A generous strategy cooperates more than its partners do. This means that 
(C,D)>(D,C) i.e. it is betrayed more often than it plays defection against a 
cooperate agent itself. 

2. An even-matched strategy has (C,D)«(D,C). This group includes the TfT 
strategy, always doing the same as the other strategy. 

3. A greedy strategy defects more than its partners do. This means (C,D)<(D,C), 
i.e., the opposite to a generous strategy. 

The basis of the subdivision above is a zero-sum play. The sum of the strategies 
(C,D) must equal the sum of the strategies (D,C), i.e., if there is a generous 
strategy there must also be a greedy strategy. The classification of a strategy can 
change depending on the surrounding strategies. Theoretically a lot of changes 
are possible making a generous strategy become an even-matched or a greedy 
strategy, or doing it in a reverse order. What will happen with a particular 
strategy depends both of the surrounding and the character of the strategy. As 
an example AllC will always be generous while 95%C will change to a greedy 
strategy when there are only these two strategies left. 

4 Simulating four different games 

If we let the letters k, I, m and n be the payoffs for (C,C), (C,D), (D,C) 
and (D,D) respectively in a symmetric game, the average payoff for a strat- 
egy Eavg(strategy) is a function of the payoff matrix and the distribution of the 
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payoffs among the four outcomes. 

Eavg{strategy) = p{C, C)k + p(C, D)l + p{D, C)m + p{D, D)n, (1) 

where p{C, C) + p{C, D) + p{D, C) + p{D, D) = 1 (2) 

The aim of the simulation is to test how different games behave in a round robin 
tournament and in a population tournament. We used four different games, 
prisoner’s dilemma (PD), chicken game, coordination game and compromise 
dilemma games to illustrate the distributions among different strategies (see fig- 
ure 0)- Additional information about the results of the simulations, definitions of 
the strategies, etc. can be found in 0. It holds for all the games that (D,D) has 
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Fig. 2. Payoff matrices for prisoner’s dilemma, chicken game, coordination game and 
compromise dilemma. 



a lower payoff value than (C,C) and for three of the games that (D,C) has the 
highest value. In an earlier paper we have examined the differences between PD 
and chicken game |^. Compromise dilemma is closely related to chicken game. 
Coordination game is a game with two dominating strategies, playing (C,C) 
or playing (D,D). Rapoport and Guyer give a more detailed description of 
possible 2x2 games. 

We ran a round robin tournament with the 15 strategies for the four different 
games described in figure [fl The greedy strategies Davis and Friedman are 
doing well in PD while chicken game and coordinate game favor the generous 
strategies AllC and Fair respectively Tf2T. Compromise dilemma favored the 
counter intuitive strategy ATfT. In our classification TfT is regarded as an even- 
matched strategy. There is no reason for believing PD to favor more generous 
strategies than the rest of the games. Finding successful greedy strategies is well 
in line with the hypothesis that PD, because of the given payoff matrices, is the 
least cooperative game from the generous strategies point of view. The chicken 
game is less greedy than PD because it costs more to play defect (0 instead of 
1 in the (D,D) case). The most successful strategies AllC and Fair are both 
generous. The coordinate game has the highest payoff value for (C,C), but it 
also has a dominating (D,D) value. The generous Tf2T is doing the best but the 
greedy strategies Davis and Friedman are also doing well compared to the other 
games. Random and ATfT, two strategies with a big proportion of (C,D)-|-(D,C) 
are doing very poorly in this game. In compromise dilemma (C,D)-|-(D,C) have 

For a full description of the strategies, see 0 
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high scores which favor the two even-matched strategies Random and ATf]^ 
ATfT has the biggest proportion of (C,D)-|-(D,C) making it a winning strategy. 

In a population tournament different strategies compete until there is only 
one strategy left or until the number of generations exceed 10.000. Because of 
changes in the distribution of strategies between different generations it is not 
possible to rely on previous descriptions of the strategies. A generous strategy 
can for example be greedy under certain circumstances. On average it must hold 
that there is the same amount of greedy strategies as generous ones, forming 
the even-matched strategies at the position of equilibrium. The population tour- 
nament was run 100 times for the four different games. It took between 2100 
(compromise dilemma) and 3400 (chicken game) on average to find a winner in 
the game. At most a single strategy can win all the 100 times, but in our simula- 
tion different strategies won different runs. In all, five strategies were not winning 
a single game namely: 95%C, ATfT, Feld, Joss and Tester. For the compromise 
dilemma, despite the fact that ATfT was the winner in the round robin tour- 
nament, the strategy did not win a single game in the population tournament. 
In the prisoners dilemma there is a change towards the originally more generous 
strategies Tf2T and Grofman. This is also true for the coordinate game, which 
also favors AllC, just as in the round robin tournament. For the chicken game 
the same generous strategies are doing well as in the PD and the coordinate 
game. The most surprising result is the almost total dominance of two greedy 
strategies, Davis and Friedman in compromise dilemma. Both strategies have a 
large proportion of (D,C) compared to (C,D) in the original round robin tour- 
nament. We also found the generous strategies to be more stable in the chicken 
game part of the matrix. 

5 Noisy environment 

In the next simulation, we introduced noise on four levels: 0.01, 0.1, 1 and 10%. 
This means that the strategies change to the opposite move for this given per- 
centage. 

In compromise dilemma Friedman, a greedy strategy dominates the popu- 
lation when the noise is 1% or below. ATfT is the second best strategy and 
together with Fair and AllD replace Friedman with 10% noise. Unlike the rest 
of the games there is a mixture of strategies winning each play for 0.1 to 10% 
noise. 

Two greedy strategies are doing well in PD with none or a small level of noise. 
Davis is doing well without noise and Friedman with 0.01% noise. Simpleton, a 
generous strategy, is dominating the population when the noise is 0.1% or more. 

In chicken game three generous strategies, Tf2T, Grofman and Simpleton are 
almost entirely dominating the population under noisy conditions. With increas- 
ing noise Tf2T first disappears then Grofman disappears leaving Simpleton as 
a single dominating strategy at 10% noise. 

® Neither of the strategies do have to be even-matched, it depends on the actual 
surroundings. 
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Finally in coordination game three generous strategies, AllC, Tf2T and Grof- 
man are winning almost all the games when noise is introduced. With 10% noise 
AllC wins all the games. 

6 Conclusions 

We investigated four different PD like games in a round robin tournament and 
a population tournament. The results were analyzed using our classification of 
generous, even-matched and greedy strategies. 

In the round robin tournament we found PD being the game which favored 
greedy strategy the most. The chicken game and the coordinate game were favor- 
ing generous strategies and compromise dilemma even-matched strategies. These 
results are not consistent with the common idea of treating the PD as the most 
important cooperating iterated game. We do not find these results surprising 
because all the used strategies are fully dependent on the mutual meetings. 

A more interesting investigation is to figure out what happens in a population 
tournament. If a strategy is generous, even-matched or greedy it is so only in 
a particular surrounding and will possibly change when the strategies change. 
A winning strategy in a population tournament has to do well against itself 
because there will be lots of copies of that strategy. A winning strategy must 
also be good at resisting invasion from other competing strategies otherwise it 
will be impossible to become a single winner. 

These restrictions in a population tournament make it natural to look for 
winning strategies among originally generous or even-matched (i.e. TfT) strate- 
gies. For three of the games, the PD, the chicken game and the coordination 
game, this is true with Tf2T and Grofman winning a big proportion of popula- 
tion games. Contrary to what was advocated by Axelrod and others, TfT was 
not among the most successful strategies. 

The most divergent result was that compromise dilemma had two greedy 
strategies, Davis and Friedman, almost entirely dominating the population tour- 
nament. Both Davis and Friedman are favoring playing defect against a coop- 
erate agent but unlike AllD they are also able to play cooperate against a co- 
operate agent. Despite a close relationship to the PD, the compromise dilemma 
finds other, more greedy, successful strategies. 

When noise was introduced to the games, chicken game and coordinating 
game almost entirely favored generous strategies. In PD and even more in com- 
promise dilemma the greedy, Friedman strategy was doing well. 

We think these results can be explained by looking at the original game ma- 
trices. For chicken game (D,D) is doing the worst, favoring generous strategies. 
Coordination game gives (C,C) the highest results which outscores greedy strate- 
gies. PD is, compared to chicken game, less punishing towards (D,D) which al- 
lows greedy strategies to become more successful. In compromise dilemma (C,D) 
and (D,C) have the best scores making a balance between different strategies 
possible. 
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Like ESS this description of MAS, as a competition between generous and 

greedy strategies, tries to find robust strategies that are able to resist invasion 

by other strategies. It is not possible to find a single best strategy that wins, but 

it is possible to tell what kinds of strategies which will be successful. 
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Abstract. This paper extends the N-person IPD game into a more in- 
teresting game in economics, namely, the oligopoly game. Due to its mar- 
ket share dynamics, the oligopoly game is more complicated and is in 
general not an exact N-person IPD game. Using genetic algroithms, we 
simulated the oligopoly games under various settings. It is found that, 
even in the case of a three-oligopolist (three-player) game, collusive pric- 
ing (cooperation) is not the dominating result. 

Keywords: Oligopoly, Cartels, Price Wars, Genetic Algorithms, State- 
Dependent Markov Chain, Coevolution. 



1 Motivation and Introduction 

In the past, the prisoner’s dilemma was frequently applied to the study of collu- 
sive pricing or predatory pricing. However, this application is largely restricted to 
the duopoly industry because most economists are only familar with the 2-person 
Iterated Prisoner’s Dilemma (IPD) game. In terms of the oligopoly industry, the 
more relevant one should be the n-person IPD game, which economists are less 
familiar with. Recently, the n-person IPD game was studied in Yao and Darwen 
(1994). Using genetic algorithms (GAs), they showed that cooperation can still be 
evolved in a large group, but that it is more dijflcult to evolve cooperation as the 
group size increases. Considering this result as a guideline for the oligopoly pric- 
ing probelm, then what the n-person IPD game tells us is that when the number 
of oligopolists is small, say 3, it is very likely to see the emergence of collusive 

This is a revised version of a paper presented at The Second Asia-Pacific Conference 
on Simulated Evolution and Learning in Canberra, Australia, 24-27 November, 1998. 
The authors thanks two anonymous referees for helpful comments. Research support 
from NSC grant NSC. 86-2415-H-004-022 is also gratefully acknowledged. 
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pricing (cooperation). However, real data usually shows that, even in a three- 
oligopolist industry, the observed pricing pattern is not that simple. (Midgely, 
Marks and Cooper, 1996) 

— First, while collusive pricing is frequently observed, it is continually inter- 
rupte by the occurence of predatory pricing (price wars). 

— Second, it is not always true that oligopolists are either collectively charging 
high prices (collusive pricing) or low prices (price wars). In fact, a dispersion 
of prices can persistently exist, i.e., some firms are charging a higher price, 
whilst others are charging a lower price. 

— Third, the firms who charge a high price may switch to a low price in a later 
stage, and vice versa. 

These features seem to be difficult to be displayed in 3-person IPD games (See 
Yao and Darwen,ibid, Figure 5). Therefore, one may reasonably conject that 
the oligopoly game is not an exact n-person IPD game. While they share some 
common features, there are other essential elements which distinguish these two 
games. 

In this paper, we consider the payoff matrix determined by the market share 
dynamics as such an essential element. In Section 2, we propose a very simple 
oligopoly game with 3 oligopolists. We then in Section 3 show that this setup 
disqualify the oligopoly game from being an n-person IPD game. Due to the 
non-equivalence of these two games, we use genetic algroithms to simulate the 
evolution of oligopoly games in Sections 4 and 5. The simulation results are given 
in Section 5, followed by concluding remarks in Section 6. 



2 The Analytical Model 



For simplicity, an oligopoly industry is assumed to consist of three firms, say 
i = 1,2,3. At each period, a firm can either charge a high price Ph or a low 
price Pi. Let a\ be the action taken by firm i at time t. o( = 1 if the firm i 
charges Ph and a* = 0 if it charges Pi. To simplify notations, let St denote the 
row vector (a*,a|,a|). To characterize the price competition among firms, the 
market share dynamics of these three firms are summarized by the following 
time-variant state- dependent Markov transition matrix, 



Mt = 
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( 1 ) 



where mb, the transition probability from state i to state j, denotes the pro- 
portion of the customers of firm i switching to firm j at time period t. Let n\ 
(i= 1,2,3) be the number of customers of firm i at time period t, and Nt the row 
vector [n\,n 2 , n|]. Without loss of generality, we assume that each consumer will 
purchase only one unit of the commodity. In this case, Nt is also the vector of 
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quantities consumed. With Nt and Mt, the customers of each firm at period t+1 
can be updated by: 

Nt+i = NtMt (2) 

To see the effect of price competition on the market share dynamics, the 
transition probabilities mb are assumed to be dependent on the pricing strategy 
vector St- If three firms charge the same price, then Mt is an identity matrix. 
Furthermore, if firm i charges Ph, then it will lose ^ percent of its consumers 
each to firms j and k, who charge Pi. Furthermore, if firms i and j charge Ph, 
then they each will lose a percent of their consumers to firm k, who charges Pi. 

Given these state-dependent transition matrices. Equation (2) can be rewrit- 
ten as: 

Nt+i = NtMt(St), (3) 

where St = {a\,a 2 ,a\) and a\ G {0, 1}. 

Equation (3) summarizes the intro-industry competition given a number of 
customers nt = The next step of our modeling is to endogenize nt by 

setting nt+i as a function of St. More precisely, 

nt+i = nt{l + 13) , l3 = l3{St) (4) 

The (3{.) function explicitly shows how the market share of the industry can 
be affected by its pricing strategies St. The simple j3{.) function considered in 
this paper is as follows. 



( 5 ) 

I Z^i—1 

if Ei=i«i = 3- 

where 6w ^ ^ dc > Sc. 

Given Equations (3)-(5), the objective of oligopolists is to maximize their 
profits or the present value of the firm, and the profits for a single period is 
given by Equation (6). 

< = - GK (6) 

where Pf is the price charged by firm i at period s, nf number of customers, 
and C fixed unit cost, nf can be obtained from Equations (3) - (6). 



3 The Oligopoly Game: an N-Perosn IPD Game? 

Before proceeding further, let us consider the relevance of the n-person IPD 
games to the oligopoly game. Is an oligopoly game necessarily an n-person IPD 
game? If not, what is their relation? For simplicity, let us consider the first r 
periods of an oligopoly game. Here, “cooperate” (G) means “charging high prices 
for all r periods” and “defect” (D) means “charging low prices for all r periods” . 
We first work out the payoff matrix defined by Yao and Darwen (1994). In our 
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Table 1. Parameters and Payoffs 



Set 


Ph 


Pl 


C 


a 


r 


C2 


Cl 


Co 


C2 


Cl 


Co 


1 


1.4 


1.2 


1 


0.2 


8 


3.47 


2.07 


1.6 


3.2 


1.33 


1.33 


2 


1.4 


1.2 


T 


0.2 


25 


13.40 


7.10 


5 


10 


1.60 


1.60 


3 


2 


1.2 


T 


0.2 


8 


3.47 


2.07 


1.6 


8 


3.33 


3.33 


4 


2 


1.2 


T 


0.2 


25 


13.40 


7.10 


5 


25 


3.98 


3.98 



case (3 oligopolists), there are six elements in the payoff matrix, namely Ci and 
Di {i = 0, 1,2). Here, Ci (Di) denotes the payoff for a specific player who plays 
C (D) when there are i players acting cooperatively. From Equations (3)-(6), Ci 
and Di can be derived. Without losing generality, let us assume that (3 = 0 and 
rii = U 2 = ri 3 = 1, then the explicit solutions obtained are: 



d2 Cl Ci; 



{PL-C)[3r-2 

(Ph-C)[ 

{Ph-C)[ 



(l-a)-(l-g)»-+Y 

{i-o)-{i-or+c 



Di Di Co 



r(Pi-C)[r+ir-iE;=i(l-«P]1 

{Pl - C)[r + ir - i Ei=i(l - oiy\ 



'{Pn-cy 




1 

1 


{Ph - C)t 


, [Cq Cq Cg] — 


{Pl - C)r 


{Ph - C)t 


1 

1 



Whether the oligopoly game is an n-person IPD game depends on the fol- 
lowing criteria (Yao and Darwen, 1994): 

~ (1) £>2 > C 2 , (2) Di > Cl, and (3) Do > Cq. 

- (4) L »2 > Cl > Co, and (5) C 2 > Ci > Cq. 

- (6) C 2 > and (7) Ci > 

It is not difficult to see that not all of these conditions can be satisfied. For 
example, in Table 1, four sets of parameters and their associated payoffs are 
given. The conditions which can be satisfied by these four sets of parameters are 
summarized in Table 2. 

Given the analysis above, we may consider the oligopoly game is a pertur- 
bation or a generalization of an n-person IPD game, and it is interesting to see 
whether the evolution process of the n-person, in particular, the 3-person, IPD 
game documented by Yao and Darwen (1994) still applies. 





Using Genetic Algorithms to Simulate the Evolution of an Oligopoly Game 297 



Table 2. Parameter Sets and Testing Results 



Inequality 


Set 1 


Set 2 


Set 3 


Set 4 


1. D 2 > C 2 


> 


> 


< 


< 


2. D\ > C\ 


> 


> 


< 


> 


3. Dq > Cq 


> 


> 


< 


> 


4. D 2 > D\ > Dq 


>,> 


>,> 


>,> 


>,> 


5. C*2 > Cl > Cq 


>,= 


>,= 


>,= 


>,= 


6. C 2 > 0.5(D2 + Gi) 


> 


> 


> 


> 


7. Cl > 0.5(Di -t Go) 


< 


< 


> 


< 



The sign > in columns 2-5 means the condition is satisifed. Other signs means the 
condition is weakly violated (=) or strongerly violated (<). 



4 Modeling the Adaptive Behavior of Oligopolists with 
GAs 

The main idea of genetic algorithms is to encode the variable one wants to 
optimize as a binary string and work with it. Following, Midgley et al (1996), 
we consider the following special class of pricing strategy ip, 

^ ^ {0, 1}, (7) 

where fit is the collection of all By this simplification, the oligopolist’s 

memory is assumed to be finite. 

While, potentially, different choices of k may lead to quite different sets of 
strategies (Beaufils et ah, 1998), the issue concerns us is the smallest value of k 
which can reasonably replicate the price dynamics of the oligopoly industry, and 
as we shall see later, setting k to equal 1 is good enough to achieve this goal. 

5 Experimental Designs and Results 

For all the experiments conducted in this study, Ph is set at “2”, Pi “1.2” and 
C “1”. Other control parameters of GAs are set according to Tables 3 and 4. 

The first experiment is to test whether GA-based oligopolists can achieve 
a reasonable level of adaptation. For this purpose, we design the experiment 
absolute-loyalty-with-no- external- effects” . In terms of notations, absolute loyalty 
means a = 0, and the absence of external effects means /3 = 0. When a = /3 = 0, 
the most profitable pricing strategy for firm i is obviously an unconditional high- 
price strategy, i.e., 

fji = 1, V5t G fh, (8) 

since a lower price will not help the firm to gain any advantages over its com- 
petitors or other industries. So, we expect that the GA-based oligopoly industry 
should converge to a state of a collusive price, i.e., the state (1, 1, 1). 

In order to test whether GAs can find out this simple solution, we ran ex- 
periment 1 for 1000 periods (125 generations) with the prespecified parameters 
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Table 3. The Parameters of the GA-based Oligopoly Game 



Memory size {k) 


1 


Number of oligopolists 


3 


Population size {1) 


30 


Number of periods in a single play (r) 


8 (25) 


Selection Scheme 


Roulette-wheel selection 


fitness function 


Profits (tt) 


Number of generations evolved (Gen) 


125 (126) 


Number of periods (T) 


1000 (3150) 


Crossover Style 


One-Point Crossover 


Crossover rate 


0.8 


Mutation rate 


0.0001 


Immigration rate 


0.001 



Table 4. Experimental Designs and Results 



Experiment 


r 


# of Simulations 


a 


5w 


5w 


A 


fc 


Results 


Pilot 


8 


5 


0 


0 


0 


0 


0 


C(5) 


1 


25 


5 


0.2 


0 


0 


0 


0 


C(2), c(l), NC(2) 


2 


8 


5 


0.2 


0 


0 


0 


0 


C(5) 



given in Tables 3 and 4. To facilitate the report of simulations, we need a few 
more notations. Let “W” refer to the state “price war” (0,0,0), “C” the state 
“collusive price” (1,1,1), “w” the states which are closer to “W” and “c” the 
states closer to “C”. “Closer” is defined in terms of Hamming distance. Thus, 
“w” includes states (0,0,1), (0,1,0) and (1,0,0), and “c” includes (1,1,0), (1,0,1), 
(0,1,1). Since there are 30 pairs of oligopolists in each period of the evolution, 
to summarize simulation results of St in terms of its distribution, let 
pI, and Pq denote respectively the percentage of the pairs who, in period t, are 
in the states labeled with “W”, “w”,“c”, and “C” respectively. Figures 1.1-1. 5 
display the time series plot of the distribution of St- From Figures 1.1-1. 5, we 
can see that the industry converges to the state “C” (1,1,1) very quickly. 

In Experiment 1, a is set to be 0.2. In the meantime, we still assume the 
absence of external effects, i.e., /3 remains to be zero. In this situation, it is not 
difficult to see that the best solution is to form a cartel and to jointly charge 
a high price. To see how well our GA-based adaptive oligopolists evolve in this 
scenario, we ran Experiment 1 for 3150 periods (126 generations), and the time 
series of the distribution of St is shown in Figures 2. 1-2. 5. From Figures 2.2 
and 2.5, we can see that, like the Pilot Experiment, p^ gradually increases and 
eventually converges to 1. However, as compared with Figure 1.1-1. 5, it can be 
seen that the convergence speed is much slower. 

The interesting patterns observed in this experiments are shown in Figures 
2.3 and 2.4. In these two simulations, we experience an oscillation between the 
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states “w” and “c”, i.e., three firms are continuously charging different prices. 
This is the second and the third stylized facts of oligopoly industries summa- 
rized in Section 1. The emergence of persistenly heterogeneous pricing may be 
caused by the inconsistency between “D 2 < C 2 ” and “Di > Cff for the first 
r periods (Table 2). This inconsistency may encourage an early defection, and 
once that happens, by the path-dependent property, the oligopoly game is fur- 
ther perturbated away from a standard n-person IPD game and may support 
its own complex dynamics. To see whether or not this conjecture is correct, we 
design the experiment 2 as shown in Table 4. 

The only difference between Experiment 1 and Experiment 2 lies in the choice 
of the parameter r. The setting has been changed from 25 to 8. By Table 2, this 
makes the first three inequalities all consistent, i.e., Di < Ci,i = 0,1,2. This 
structure shall punish early defection, and keep the payoff structure unchanged. 
Then the whole process can be reinforced (an aspect of the path-dependent 
property). The simulation results, as we have conjected, all converge to the state 
of collusive pricing. 

6 Concluding Remarks 

The message revealed in this paper is simple: the oligopoly game in general is not 
an n-person IPD game and, in effect, is more complicated than that. Therefore, 
the simulated results can be quite rich in even a 3-person oligopoly game. But, 
that also bridges the gap between the complexity of the oligopolists’ pricing 
behaviour and the the simplicity of the insight gained from the n-person IPD 
games. In a word, we think that the oligoply game is a meaningful generalization 
of the n-person IPD game, and a formal mathematical treatment of it is definitely 
a direction for future research. 



References 

1 . Beaufils, B. J.-P. Delahaye and P. Mathieu (1998), “Complete Classes of Strategies 
for the Classical Iterated Prisoner’s Dilemma,” in V. W. Porto, N. Saravanan, D. 
Waggen and A. E. Eiben (eds.), Evolutionary Programming VII, pp. 32-41. 

2. Midgley, D. F., R. E. Marks, and L. G. Cooper (1996), “Breeding Competitive 
Strategies,” forthcoming in Management Sciences. 

3. Yao, X. and P. J. Darwen (1994), “An Experimental Study of N-Person Iterated 
Prisoner’s Dilemma Games,” Inoformatica, Vol. 18, pp. 435-450. 




An Evolutionary Study on Cooperation in 
N-person Iterated Prisoner’s Dilemma Game 



Yeon-Gyu Seo and Sung-Bae Cho 

Department of Computer Science, Yonsei University 
134 Shinchon-dong, Sudaemoon-ku, Seoul 120-749, Korea 
[kitestar , sbcho] Ocandy . yonsei . ac . kr 



Abstract. The iterated prisoner’s dilemma game has been used to study 
on the evolution of cooperation in social, economic and biological sys- 
tems. There have been much work on the relationship of number of play- 
ers and cooperation, evolutionary strategy learning as a kind of machine 
learning, and the effect of payoff function to cooperation. This paper at- 
tempts to reveal that cooperative coalition size depends on payoff func- 
tion and localization affects the evolution of cooperation in the N-player 
Iterated Prisoner’s Dilemma (NIPD). Localization makes individuals to 
interact or learn with adjacent individuals. Experimental result reports 
that cooperative coalition size increases as the gradient of the payoff 
function for cooperation becomes steeper than that of defector’s payoff 
function or as minimum coalition size gets smaller. It is also shown that 
localization of interaction is an important factor to affect cooperative 
coalition. 



1 Introduction 

The iterated prisoner’s dilemma game has been studied for long time. In general, 
a player in IPD must choose one of the two decisions, defect (D) or cooperate 
(C). Table 1 shows the payoffs for all the possible combinations of decision. The 
game is repeated infinitely and none of the players know the end of game. No 
matter how many players cooperate, anyone of them will earn better payoff by 
defecting. Therefore, defect may be a rational selection, and all players may get 
to select D and obtain payoff P. However, if all cooperate, they would get better 
score than all defect. This is the dilemma for players to face with in the IPD 
game. 

Originally, most of the works were focused on 2IPD. However, 2IPD cannot 
model such complex problems as social and economic problems in real world. It 
is the NIPD game that has appeared as more realistic model. Table 2 shows an 
example of payoff function in NIPD game. The basic principle in 2IPD game is 
also true for NIPD game: Defect is dominant for each player. In NIPD game, 
there are many parameters to be considered such as payoff function [3], noise 
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1^, population structure [^, localization ji], the shadow of the future [T] [2], 
the number of players |3- and so on. Among these parameters, we consider two 
parameters: payoff function and localization. 



Table 1. Payoff matrix in 2IPD. T > R > P > S, 2R > T + P 





Cooperate 


Defect 


Cooperate 


R 


S 


Defect 


T 


P 



The payoff function is generally fixed in the game but there exist many 
criteria of payoff function in real world, especially in economic and social systems. 
We examine the effect of payoff function in the first experiment and the second 
reveals the relationship of cooperative coalition and localization. The latter is 
divided into two factors: learning and interaction. These factors are reported to 
affect the emergence of cooperation . This paper uses co-evolutionary strategy 
learning which has advantage of reflection of dynamic environment. A genotype 
has all information which determines a player’s next move according to the other 
player’s previous moves as well as his own previous moves. 

In Section 2, we discuss relationship of cooperation and payoff functions, 
and Section 3 focuses on localization. In Section 4, we conduct NIPD game 
repeatedly with various payoff rules and test the localization effects in learning 
and interaction. 



Table 2. Payoff matrix in NIPD. 



No. of Cooperator 


0 


1 • 


.. X ■ 


.. N -1 


Cooperate 


Co 


Cl ■ 


a • 


Cn-1 


Defect 


Do 


Di ■ 


■■ ■ 


■ ■ Dn-1 



2 Payoff Function 

In this section, we discuss whether the payoff function is important to cooperative 
coalition or not. In real society, a selfish and rational individual has a tendency 
to select action according to payoff that it takes. If there is room for getting more 
payoff by specified action like as cooperative coaltion in IPD game, an individual 
might select the collective action to get better result. Thus, we experiment the 
payoff function as a very important factor in this paper. Generally, the payoff 
function in NIPD game satisfies the following condition. 
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Cx ^ Cx — l^ Dx > Dx—l} Dx > Cx^! Cj^—l > Dq 



The payoff function is a very important factor to determine minimum coalition 
size E 0 - According to Schelling, minimum coalition size means the number of 
players among which a player obtains any interest, zero or more. Cooperative 
coalition can emerge above minimum coalition size. 

In IPD game, the payoff function is fixed and the payoff of defect is generally 
higher than that of cooperate, but we can easily find many criteria for payoff 
function in real world. Therefore, we also examine the linear and quadratic func- 
tions for the payoff function which does not belong to payoff rule in NIPD game. 
This gives us some possibility to observe role of payoff function in depth. Fig. |T] 
shows some of possible payoff functions. Here, important parameters of the pay- 
off function in the game are the x-intercept and the gradient of C. In the cases of 
(c) in this figure, as the number of cooperators increases, the payoff of cooperate 
gets to overrun that of defect. This is not fit for the payoff rule in NIPD game, 
but there are many similar payoff functions in social and economic systems. We 
attempt to observe the coalition size by changing the payoff function in NIPD 
game. 

In the case of Fig. Ha), the larger y-intercept of cooperate, k, is, the smaller 
the number of cooperators is. In the case of Fig. He), the gradient of defect is 
steeper than that of C. It can be expected that the result ends up with defect 
because the payoff is unfavorable to cooperator. 



Fig. 1. Payoff functions. Solid line is payoff function for Defection and dashed line for 
cooperation, (a) Cx = 3x — k and Dx ~ ?>x (b) Cx = 2x — k and Dx = 3a: -|- 1 (c) 
Cx = ^x^ — k and Dx = \/2x 



3 Localization 

The emergence of cooperation in the game is strongly affected by the localiza- 
tion of both interaction and learning [3] . Localizing learning means restricting the 
subset of the population from which players can learn better-performing strate- 
gies. Nowak and May 0 study a population of agents distributed on squares on 




(a) 



(b) 



(c) 
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a torus which are only capable of the always defect (AD) and always cooper- 
ate (AC) strategies. Each agent interacts with the agents on all eight adjacent 
squares and imitates the strategy of any better performing one. Cooperative 
behavior can be sustained in clusters of agents that insulate cooperators from 
hostile ADs under certain payoffs. Warning and Hoffmann jl] consider localized 
interaction and learning between agents on torus employing Moore machine to 
play game. 

In order to ascertain the effect of localizing learning and interaction, we make 
an individual interact and learn with adjacent individuals distributed on torus 
which is to avoid boundary effect where players in the boundary have an unequal 
number of neighbors. In many cases, it is not clear why the localization of both 
learning and interaction should coincide. It is easy to imagine the situation where 
individuals interact locally while being able to observe what individuals outside 
their interaction-neighbourhood are doing. 



4 Experiments 

For the experiments of payoff function and localization, we use population size of 
100, crossover rate of 0.6, mutation rate of 0.001, and two-point crossover with 
elite preserving. In localization, we consider two factors, learning and interaction, 
which might have different effects on the evolution of coalition size. 



4.1 Payoff function 



A. Cx = Sx — k and Dx = 3a; 

This case is one of general IPD game. The cooperative coalition size is shown 
in Fig. 1^ according to k. When the number of players is small, cooperative coali- 
tion size depends on the number of players: They all cooperate in this experiment. 
However, as the number of players gets larger, the difference of cooperative coali- 
tion size becomes larger according to k. This result indicates that the number 
of players affects to cooperative coalition. 

We can see that cooperative coalition gets lower in Fig. as k becomes 
larger. This result supports that the number of players and y-intercept are im- 
portant factors to affect cooperation. 

B. Cx = 2x — k and Dx = 3a: 

The gradient of payoff function of defect is steeper than that of cooperate. 
In this case, we can expect all players to defect in any case. However, the result 
says that high cooperative coalition can appear even in this case. We can see in 
Fig. [H that cooperative coalition is stabilized at high level when the number of 
players is 4 and k < 3. In Fig. I^b) , cooperative coalition size is small because the 
gradient of payoff function and the number of players are adverse to cooperators 
as the number of players becomes larger. Nevertheless, if the number of players 
is small and > Dq, the cooperation coalition could be stabilized at high level. 
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Fig. 2. a = 3a; 





(b) 8 players 



— k and = 3x. 




(a) 4 players 
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(b) 8 players 



Fig. 3. Cx = “2x — k and Dx = 3a;. 



C. Cx = — k and = -s/Sa; 

In this case, if the number of cooperators is large, the payoff of cooperator 
overruns that of defect. This case makes us to expect cooperative coalition to be 
stabilized at high level. However, when k is large and the number of players is 
small, all players defect. It is because the number of players is less than or equal 
to the minimum coalition size. In this case, the larger the number of players is, 
the higher cooperative coalition is. 

With these experiments, we could get something essential to evolve cooper- 
ative coalition to high level. They are payoff for all C and minimum coalition 
size. If gradient of the payoff for C is steeper than that of all defect, cooperation 
is stabilized at high level as shown in Fig. [5[ Here, minimum coalition size is 
determined by payoff function. If the minimum coalition size is small, high level 
of cooperation could emerge |3J. 
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(a) 4 players 



(b) 16 players 



Fig. 4. Cx = — k and Dx = V^. 



High cooperation level is determined by various factors such as the number 
of players, the gradient of payoff function, x-intercept, and so on. From the 
simulations, we can see that the high cooperation level can emerge, as long 
as the condition such as small number of players, small x-intercept and steep 
gradient of payoff function for C is satisfied. 

4.2 Localization 

Experiments have been conducted in two aspects. One is a localizing interaction 
and the other is a localizing learning. It is not clear that two factors have the 
same effect on the cooperative coalition. 

Localization of interaction High level of localization of interaction can im- 
prove cooperativity but it can do that only with localized learning . We exper- 
iment localizing interaction and learning, respectively. This paper uses genetic 
approach to represent strategy and a genotype has all information to determine 






(a) 4 players and 2 
neighbors 



(b) 4 players and 8 
neighbors 



(c) 4 players and 10 
neighbors 



Fig. 5. Localization of inteaction. 
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the next move. Fig.|^shows the variations of coalition size in the case of localiz- 
ing interaction. We can see that cooperative coalition size is large when the level 
of localization is high. 

When local size is small, players gradually cooperate but it takes some time 
to be stabilized. When the number of neighbours is 8 or 10, players almost defect 
at first. As time goes on, players turn to keep cooperating. In comparison to the 
size of 2, it shows somewhat unstable but it takes little time to reach to a stable 
state. 

Localization of learning Just as localizing interaction, localizing learning 
could be an important factor to improve cooperativity |1]. Experimental results 
indicate that it is adverse to the evolution to cooperation. It seems that local- 
ization of learning prevent the population from evolving to a different state from 
the initial state. Fig. El shows the result. 

We can conclude that localization of interaction is very important to affect 
cooperative coalition size, but the effect of localizing learning produces the am- 
biguous result to cooperative coalition. 
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Fig. 6. Localization of learning. Dotted line is for 2 neighbors and large dot is for 
General case, when the number of players is 4. 



5 Concluding Remarks 

This paper aims to study on cooperative coalition through experiments of payoff 
function and localization. In real society, payoff function acts on selection of 
individuals in economic and social systems. Generally, in IPD game, the payoff 
function is fixed, but we can easily find many criteria for payoff function in real 
world. Therefore, we have also considered payoff functions which is not fit for 
original payoff rule in NIPD game. 

A series of simulations reports that payoff function is very important to 
improve cooperative coalition. The steeper the gradient of payoff function of C 
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is, the higher the level of cooperative coalition is. In this case, the larger the 
number of players is, the higher the level of cooperative coalition is, especially 
in case that the gradient of payoff function for C is steeper than that of D. 

The localization of interaction also improves the level of cooperative coalition 
size. If the local size is small, cooperative coalition should be stabilized at high 
level. However, the effect of local learning is obscure. We could not confirm the 
effect of localization of learning, which requires much work for further study. 
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Abstract. This paper describes how a state emerges and collapses that 
makes it possible for citizens to do something which they will not do vol- 
untarily. The model is a generalisation of the multi-stage game of Okada 
and Sakakibara (1991). The general model, its mathematical analysis 
with condition for simplicity and simulations for more general cases are 
presented. The results of simulations suggest that selfish but rational 
people may agree to make a state, which grows as the public capital 
stock accumulates but collapses when the stock reaches a certain level. 

1 Introduction 

The tragedy of commons is a well-known example of how people fail to cooperate 
for maintaining the public capital. In the circumstances people may voluntarily 
make a state that force themselves to construct and maintain the public capital 
stock. Okada and Sakakibara (1991) presents a multi-stage game to show this 
possibility. We shall show a more general model and analyse it both mathemat- 
ically and by simulations. 

In Section 2 we shall explain the basic model, which is divided into four sub- 
games: first each inhabitant announces whether he or she becomes a citizen or an 
outsider; secondly all citizens propose tax rates, of which the smallest is adopted 
as the tax rate; thirdly every citizen proposes the rate of the enforcer’s salary to 
the total tax revenue and the person who proposed the minimum ratio is chosen 
as the enforcer who watches for tax evasion without making private business is 
elected; last tax payers pay taxes honestly or become tax evaders, whose income 
from private business will be all confiscated by the enforcer if tax evasion is 
found by him or her. In Section 3 we shall show how the subgame-perfect Nash 
equilibrium for a simple case mathematically. In Section 4 we shall show the re- 
sults of simulations for more general cases, which suggest some inhabitants may 
make a state, which grows as the public capital stock accumulates but collapses 
when the stock reaches a certain level. 
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2 The model 

The outline of our model is as follows. There live n inhabitants in a valley 
irrigated by a canal. Inhabitants make shovels in the winter, dredge the canal in 
the spring and grow rice in the summer. They exchange all or some of the shovels 
they made for rice with foreigners living outside the valley, while using the rest 
of the shovels for dredging the canal. Their non-agricultural income is defined 
as the rice value of the shovels they made, which may differ from inhabitant 
to inhabitant according to their skill, while their non-agricultural income is the 
harvest of rice, which depends on the location of their private rice field as well 
as the condition of the canal. 

For the sake of simplicity, let us assume the following; the non-agricultural 
income of the zth inhabitant is 7 *, while his agricultural income is f3iK where 
K represents the depth of the canal; if a unit value of shovels are used for 
dredging the canal, they are all worn out while increasing the depth of the canal 
by one inch. In other words, if the depth of the canal is K at the be beginning 
of the spring and the zth inhabitant contributes lOOti percent of the shovels 
he made to the dredge of the canal, the total income of the zth inhabitant is 
(1 - + E"=i tjlj)- 

The difficulty the valley is faced with is that no one may contribute their 
shovels to the dredge of the canal even if it can increase every inhabitant’s 
income. This is because th zth inhabitant’s contribution to the dredge of the canal 
is distributed among all inhabitants: it decreases his non-agricultural income by 
ti”fi while increasing the jth inhabitant’s agricultural income by Pjtiji. In other 
wards, the valley’s income increases in net terms if 1 < while the zth 

inhabitant’s total income increases only if 1 < Pi. Hence no one contribute if 
I < Pi for all n, even if everyone’s income increases if everyone contributes only 
small portion of his income. 

Some organization or the system of monitoring and punishment may be re- 
quired for making people contribute to the accumulation and maintenance of 
the public capital stock. In this paper we examine the following scenario or the 
four-stage game. 



Subgame 1 

Each inhabitant announces whether he or she becomes a citizen or an outsider. 
Accordingly the n inhabitants in the valley: {Inhabitant i|i G N = (1,2, . . . ,n}} 
are divided into to(0 < m < n) citizens: {Inhabitant i|i G M C N, } and n — m 
outsiders: {Inhabitant i|i G L = N — M}, where M\J L = N and M C\ L = %. The 
citizens advance towards the following stages to determine their role or duty as 
well as the penalty which may be imposed on those who do not perform it, while 
the outsiders can enjoy all benefit from the public capital without making any 
contribution to its accumulation or maintenance. 
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Subgame 2 

Every citizen announces the acceptable tax rate on non-agricultural income Tj. 
The minimum r* is adopted as the tax rate of the state: r = min^ m Ti. (Ev- 
eryone can virtually dissolve the state by proposing Tj = 0.) 



Subgame 3 

Every citizen offers him/herself as the candidate for the enforcer who makes nei- 
ther shovels nor rice to concentrate on monitoring the other citizens, by declaring 
the ratio of the enforcer’s salary to the tax revenue of the state 9i. The person 
who has proposed the minimum 9i is elected as the enforcer and has salaries paid 
accordingly: if min^ m 9t = 9g = 9 , Inhabitant e is the enforcer, whose salary is 
T "fj where Inhabitants i (i € T) pay taxes honestly while Inhabitants j 

(j G U) pay no taxes (M = {e} UTUU and {e} HT = THU = U H {e} = 0). In 
addition to the salary, the enforcer can confiscate all non-agricultural income of 
the tax evaders he or she has found out, whose expected value is jj jj 

as his/her income. Here it is assumed that Inhabitant e can find out each tax 
evader at the probability of ef if he or she monitors I citizens. 



Subgame 4 



The TO — 1 tax payers make shovels and rice, and pay or do not pay taxes. As 
the result. Inhabitant i expects the following income: 



(=9 T +^^rn-lJ2 



7j 



if i = e 



A,: / 



= (1 - T )7i -I- /3j{(l - 9 )t + K} if i G T 

^ ^ • ( 1 ) 
= (1 - e^-i)7i + A{(1 -S )t + K} if i eU 

3 T 

= + A{(1 - if i e L 

3 T 



3 Game Theoretic Analysis 

As to the range of the exogenous parameters: j3i, 7*, ^ n and n 

(where 1 < i < n and 2 < to < n), let us assume the following: 0 < Pi < 1, 
0<7i,0 <e;< 1,0<A' and 3 < n. Here the first condition implies that no one 
voluntarily contributes to the accumulation of the public capital, while the last 
one is a necessary condition for the emergence of a state; it can readily checked 
that if a state is made by two persons, the only tax payer’s income is - whether 
he or she honestly pays tax or not - smaller than it would be if he or she were 
an outsider. 

Although we can prove the existence of the unique sub-game perfect Nash 
equilibrium for our model as well as its mathematical expression under more 
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general conditions, we should here like to mention only simple and symmetric 
case where the following conditions are satisfied: 

1 . /3i = /3, 7 i = 7 and = e for all 1 < z < n and 1 < m < n. 

2. Every inhabitant knows the structure of the game as well as all the exogenous 
parameters as common knowledge. 

3. In Subgame 3, the enforcer is chosen by lottery if more than one citizens 
propose the minimum 6. 

4. In Subgame 1, every inhabitant announces whether he or she joins the state 
or not by turns. 

On the first three assumptions we can solve Subgame 4, Subgame 3 and 
Subgame 4 in this order. In fact t and 9 are expressed in terms of m, or the 
number of the citizens jJM: 
if 2 < TO, max[e,r^] < min[j^,fe] and K < 

T = max[e, r^] and 0=0; (2) 

if 2 < TO, < e and 0 < j3m — (3 —1, 

T = e and 9=9; (3) 

if 2 < TO, max[e, f] < min[Y^, r®, I] and and 0 < Pm — P — 1, 

T = min[ ^ , r^, 1] and 0 =0; (4) 

1 P 



otherwise 


r = 0 . 


(5) 


Here 




(6) 




- _ (1 - r )7 -b /3 {t(to - 1)7 -b K} 
(1 -b P)t (to - 1)7 


(7) 




1 j + PK 

T = 


( 8 ) 




m7 




P _ + PK) - (1 -b P)e{m - 1)7 

(/3 - TO -b 1)7 


(9) 




c(to — 1)7 — / 3(7 -b ) 
(1-/3) (to- 1)7 


( 10 ) 




P{l + PK) 
{Pm — P — 1)7 


( 11 ) 



Hence for every M, every inhabitant’s expected income is uniquely deter- 
mined; for example the expected income of any citizens is determined as ^ x 
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the enforcer’s expected income -|-(1 ~ the average citizen’s income (as is 
readily checked, every citizen is a honest taxpayers and earns the same income). 
This assures that when he or she must say whether he or she joins the state, 
every inhabitant can make backward induction to see which brings him or her in 
greater income. This assures the existence of the unique subgame-perfect Nash 
equilibrium. 



4 Simulations 



The analysis of the previous section suggests that even if the existence of the 
unique subgame-perfect Nash equilibrium is mathematically proved, its explicit 
expression is often unattainable or rather complicated. To see how the subgame- 
perfect Nash equilibrium changes as time passes, simulation is necessary. In fact 
simulation makes it possible to examine the dynamics of the model with less 
restrictive conditions. As an example let us show the simulation of cases where 
non-agricultural income is not common to all inhabitants in this section. To put 
it concretely, we shall show the results of simulation, assuming the following: 
n = 7; f3 = 0.9; = 0.6 -k 0.4^; e = 0.2; K{0) = 0; K{t) = K{t - 1) + 

{i-e {t))r S(t) shall show two cases for these parameters: Case 

1 (where inhabitants announce whether they become citizen or an outsider in 
order of their productivity) and Case 2 (where they announce in the reverse 
order) . 

In Figure [T] the left side graphs shows Case 1 while the right side ones 
describes Case 2. 

Figure [T] shows how Ei changes as time passes. 




P7 declares first 



P1 declares first 



Fig. 1. The dynamics of the state 
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Since both dynamics are basically the same, let us explain only Case 1. 

In Year 0 Inhabitants I, 2, 3 and 4 make a state and Inhabitant I, who 
has the smallest productivity among them is elected to be the enforcer. This is 
because the value of parameters and the initial condition are such that only a 
state with four individuals can be formed. Knowing this, Inhabitants 5, 6 and 
7 announce that they become outsiders, expecting that the remaining four will 
make a state. 

In Year 5 Inhabitant 5 joins the state. This is because the total income of 
Inhabitants 1, 2, 3 and 4 can earn if no state is made has increased as the result 
of the accumulation of the public capital. They will not make a state, which does 
not make their income lager than it would be if no state is formed. Realises that a 
state is made if and only if Inhabitant 5 declare that he or she becomes a citizen 
(on the supposition that Inhabitants 6 and 7 become outsiders), Inhabitant 5 
compares the income he or she obtains if no state is formed and what he or she 
gets if he or she joins the state of Inhabitants I, 2, 3 and 4 (actually 2 leaves the 
state, but it has no effect to 5’s decision); he or she finds the latter is greater 
even though it is smaller than his or her income in the previous year. 

Inhabitant 6 join the state in Year 9 and Inhabitant 7 also become its citizen 
in Year 11. The reason they join the state is the same as Inhabitant 5 becomes 
a citizen in Year 5. It would also be obvious why they become a citizen even 
though their income temporarily decreases. 

An exceptional phenomenon is observed when Inhabitant 5 joins the state: 
Inhabitant 2 leaves the state. This is because (he or she knows) Inhabitants 5, 4, 
3 and 1 make a state; though Inhabitants 4, 3 and 1 do not agree to make a state 
with Inhabitant 2, they agree to make a state with Inhabitant 5 who has higher 
productivity. Nevertheless soon (actually in the next year) making a four-person 
state cannot attract any group of four individuals so that Inhabitant 2 becomes 
a citizen again to make a five-inhabitant state. Though each of players considers 
only how his or her maximize his or her payoff in one-shot game, it seems that 
he or she adapt his or her behavior to Capital Stock in one-shot iterated game. 
In other words, we can say that players share their roles and cooperate with each 
other voluntarily. 

Now the dynamics of 9 and r is described in Figure 2: both t and 9 period- 
ically increase: they repeat a monotonous increase and a sharp fall. Every time 
the value of either value becomes nearly equal to unity, the number of citizens 
increases so that (by the reason mentioned above) they can cooperate with a 
smaller value. This trick can however work only till the capital stock reaches a 
certain amount. 

The dynamics of capital stock is shown in Figure 3: It is only natural that 
the capital stock monotonously increases in our model where never depreciates. 
It is also apparent why the speed and the final level of capital accumulation are 
greater when Inhabitant I declares whether he becomes a citizen or not: then 
those inhabitants with higher productivity become citizens, who pay more taxes 
to accumulate the capital stock. 




tax rate 
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5 Concluding remarks 

Though we mentioned very simple cases in the previous sections, we have made 
more mathematical analysis and simulations for more general cases. As an ex- 
ample, our model shows there are some cases where some citizens join the state 
to evade taxes. This may seem irrational because remaining an outsider is better 
than becoming an tax evader who may be charged penalties, but a state may not 
be formed without tax evaders who increase the enforcer’s expected income. We 
are now analysing these cases, which will hopefully deepen our understanding of 
the emergence of the state. 

Our analysis could be criticised for assuming subgame perfectness, which 
is often failed to be realised in experiments (probably because of the lack of 
required computational power and/or sticking to fairness; see for example Davis 
and Holt (1993). Certainly more realistic cases, in particular such cases where all 
or some individuals behave adoptively rather than perfectly rationally, should 
be examined. We are actually developing our analysis toward this direction. 

We believe that the mathematical analysis of game theory and the evolu- 
tionary simulations can be complemental approaches. Although still at an early 
stage of our study, we should be grateful if the reader could find this possibility 
in our game theoretical analysis with computer simulations 

This project “Methodology of Emergent Synthesis” (JSPS-RFTF96P00702) 
has been supported by the Research for the Future Program of the Japan Society 
for the Promotion of Science. 
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Abstract. We have already shown that the relation between neural net- 
works and linguistic knowledge is bidirectional for pattern classification 
problems. That is, neural networks are trained by given linguistic rules, 
and linguistic rules are extracted from trained neural networks. In this 
paper, we illustrate the bidirectional relation for function approximation 
problems. First we show how linguistic rules and numerical data can be 
simultaneously utilized in the learning of neural networks. In our learn- 
ing scheme, antecedent and consequent linguistic values are specified 
by membership functions of fuzzy numbers. Thus each linguistic rule is 
handled as a fuzzy input-output pair. Next we show how linguistic rules 
can be extracted from trained neural networks. In our rule extraction 
method, linguistic values in the antecedent part of each linguistic rule 
are presented to a trained neural network for determining its consequent 
part. The corresponding fuzzy output from the trained neural network 
is calculated by fuzzy arithmetic. The consequent part of the linguistic 
rule is determining by comparing the fuzzy output with linguistic values. 
Finally we suggest some extensions of our rule extraction method. 

Key words: Learning of neural networks, hybrid learning, fuzzy neural 
systems, linguistic knowledge, rule extraction. 



1 Introduction 

When multi-layer feedforward neural networks are used as information process- 
ing systems such as classifiers and function approximators, they are usually han- 
dled as black box models. That is, we do not know why a trained neural network 
produces a particular output (e.g., classification, decision making, and predic- 
tion) for a new input vector. Several attempts have been tried to improve the 
transparency of neural networks. One approach is rule extraction from trained 
neural networks. In this approach, black box models are explained by extracted 
rules. Various methods [1-3] have been proposed for extracting non-fuzzy if-then 
rules and fuzzy if-then rules. Almost all of those approaches were designed for 
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extracting classification rules from neural networks trained for pattern classifi- 
cation problems. In this paper, we discuss the transparency of neural networks 
trained for function approximation problems. We use linguistic rules, which are 
fuzzy if-then rules with linguistic interpretation, for explaining neural networks. 

Another approach for improving the transparency is the learning of neural 
networks from experts’ knowledge. We have already shown that neural networks 
can be trained by fuzzy if-then rules [4]. In our learning method, neural networks 
are trained by simultaneously utilizing linguistic knowledge and numerical data. 
Based on our former studies on the fuzzy rule extraction [3] and the learning from 
fuzzy if-then rules [4] , we have shown that the relation between neural networks 
and linguistic knowledge is bidirectional for pattern classification problems [5]. 
In this paper, we illustrate the bidirectional relation between neural networks 
and linguistic knowledge for function approximation problems. 

2 Learning from Linguistic Rules 

Our problem in this section is to train a neural network for approximately real- 
izing an unknown nonlinear function with n inputs and a single output. For this 
task, we use a standard three-layer feedforward neural network [6] with n input 
units, uh hidden units and a single output unit. 

We assume that we have a set of input-output pairs obtained from the 
unknown nonlinear function as training data. We denote the training data as 
(xp, yp),p = 1, 2, ... , where Xp = {xpi,Xp 2 , ■ ■ ■ , Xpn) is an n-dimensional 

input vector, i/p is the corresponding output value, and mp)a,ta i® number of 
the given input-output pairs. We also assume that we have linguistic knowledge 
about the unknown nonlinear function. We denote the linguistic knowledge as 
the following linguistic rules: 

Rule i?q : If Xi is Agi and . . . and a;„ is Agn then y is Bq, g = 1, 2, . . . , WRuig, (1) 

where Rq is the label of the g-th linguistic rule, Aqi’s {i = 1,2, ... ,n) are an- 
tecedent linguistic values such as ^^smalF and ^‘large’\ Bq is a consequent lin- 
guistic value, and is the number of the given linguistic rules. We assume 

that the meaning of each linguistic value is specified by its membership func- 
tion. That is, we handle linguistic values as fuzzy numbers. In FiglU we show 
membership functions of typical linguistic values. 

In the learning of the neural network, the given linguistic rules in © are 
handled as fuzzy input-output pairs (Ap, Bp),p = 1,2, ... , where Ap = 

{Api, ... , Apn) is a fuzzy vector. Since real numbers can be viewed as a special 
case of fuzzy numbers, the given numerical data (xp,?/p),p = 1, 2, . . . , 
are also handled as fuzzy input-output pairs. This means that two kinds of 
available information (i.e., numerical data and linguistic knowledge) are handled 
in a single framework as fuzzy input-output pairs. In this paper, we denote 
the linguistic knowledge {Ap, Bp),p = 1, 2, . . . , and the numerical data 

{xp,yp),p = 1,2, . . . ,TOp)ata as {Ap,Bp),p = l,2,...,m where m = -b 

TOData- From the above discussion, we can see that our problem is to train the 
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neural network by the fuzzy input-output pairs (Ap, Bp),p = 1,2, ... ,m. When 
the fuzzy vector Ap = {Api, . . . , Apn) is presented to the neural network, the 
input-output relation of each unit can be written as follows [4] : 

Input units: Opi = Api, i = 1,2, . . . ,n, (2) 

n 

Hidden units: Opj = f(^Wj^ ■ Opi + 9j), j = 1,2, ... ,uh, (3) 

riH 

Output unit: Op = /(^ Wj ■ Opj + 9), (4) 

i=i 

where wji and Wj are connection weights, 9j and 9 are biases, and /(•) is the 
sigmoidal activation function: f{x) = 1/{1 -I- exp(— a;)}. Our neural network 
architecture in ©-O is the same as the standard three-layer feedforward neural 
network [6] except that the input and output of each unit are fuzzy numbers. 
As in various studies on fuzzified neural networks [7,8], the fuzzy input-output 
relation of each unit is defined by fuzzy arithmetic [9] and numerical calculation 
is performed by interval arithmetic [10] on the level sets of fuzzy numbers. 

In the learning of the neural network, we have to define a cost function to 
be minimized. We measure the difference (or distance) between the actual fuzzy 
output Op and the fuzzy target Bp using their h-level sets as 

d{Bp,Op) = 5^([Sp]^ - [Op\if/2 + Y,{[Bp]l - [Op]lf/2, (5) 

h h 

where [-]^ and \-]^ are the lower limit and the upper limit of the h-level set [-]h 
of a fuzzy number, respectively. 

In the same manner as the back-propagation algorithm [6], a learning algo- 
rithm can be derived for adjusting the connection weights and biases from the 
cost function in ©. For details of the derivation, see Ishibuchi et al.[4,8]. 

3 Computer Simulations 

Let us illustrate the learning from numerical data and linguistic knowledge by 
computer simulations on a simple numerical example. As an unknown nonlinear 
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Fig. 1. Membership functions of typical linguistic values. {S'.small, MS:medium small, 
M'.medium, ML:medium large, and O.large) 
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function, we used the following one: 

2/ = /(x) = (45 + 45)2/4, ( 6 ) 

This nonlinear function is depicted in Fig. 2 (a). As numerical data, we generated 
200 input-output pairs by randomly specifying input vectors Xp = (xpi, Xp 2 ), p = 
1,2,..., 200 in the input space (i.e., the unit square [0, 1] x [0, 1]). First we trained 
a three-layer feedforward neural network with two input units, five hidden units, 
and a single output unit using the 200 input-output pairs. We employed the 
standard back-propagation algorithm [6] with the momentum term (the learning 
rate and the momentum constant were specified as 0.25 and 0.9, respectively). 
The shape of the output from the trained neural network after 10000 epochs is 
shown in Fig. 2 (b). From the comparison between Fig. 2 (a) and Fig. 2 (b), 
we can see that the neural network could approximately realize the unknown 
nonlinear function very well. 

1.0 

J 

1.0 

0.0 
0 . 

(a) (b) 

Fig. 2 . Learning from numerical data, (a) Unknown function, (b) Result of the learning 
from numerical data. 

Next we trained the same neural network using only 100 input-output pairs 
whose first input values are less than 0.5 (i.e., xi < 0.5). The shape of the 
output from the trained neural network is shown in Fig. 3 (a). Since the given 
numerical data were not sufficient, the neural network could not approximate 
well the unknown function in this case. Finally we trained the same neural 
network using the 100 input-output pairs and the following linguistic knowledge: 

If Xi is large and X 2 is small then y is medium small, 

If xi is large and X 2 is large then y is large, 

where the membership function of each linguistic value is shown in Fig. 1. It 
should be noted that these two linguistic rules are not sufficient to describe the 
unknown nonlinear function in Fig. 2 (a). In Fig. 3 (b), we show the shape of 
the output from the neural network trained by the insufficient numerical data 
and the insufficient linguistic knowledge. As we can see from Fig. 3 (b), we 
obtained a good approximation result because these two kind of information 
were simultaneously utilized in the learning of the neural network. 
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Fig. 3. Learning from numerical data and linguistic knowledge, (a) Result of the learn- 
ing from insufficient numerical data, (b) Result of the simultaneous learning from nu- 
merical data and linguistic knowledge. 

4 Linguistic Rule Extraction 

Our task in this section is to extract linguistic rules from trained neural networks. 
We assume that a trained neural network has already been given. We do not 
assume any particular network architectures or learning algorithms. 

Our linguistic rules to be extracted from the trained neural network are of the 
same type as in the previous sections. In our rule extraction method, we examine 
all combinations of antecedent linguistic values. When we have the five linguistic 
values in Fig. [T] for each of n inputs, the total number of possible combinations of 
antecedent linguistic values is 5". For extracting a linguistic rule, first we present 
a combination of antecedent linguistic values to the trained neural network. The 
antecedent part of the linguistic rule is specified by these linguistic values. Then 
we calculate the corresponding fuzzy output from the trained neural network 
using the fuzzy input-output relation in 0-(0- Finally the fuzzy output is 
compared with each linguistic value in order to determine the consequent part 
of the linguistic rule. The difference (or distance) between the fuzzy output and 
each linguistic value is measured by dS). The closest linguistic value to the fuzzy 
output is chosen as the consequent of the linguistic rule. Not only individual 
linguistic values but also their combinations (e.g., small or medium small) are 
considered as a candidate for the consequent of each linguistic rule. 

Let us illustrate our rule extraction method by a simple numerical example. 
In our computer simulation, we used the three-layer feedforward neural net- 
work that had already been trained by the back-propagation algorithm in Fig. 
2 (b). Our task is to extract linguistic rules from the trained neural network. As 
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antecedent and consequent linguistic values, we used the five linguistic values 
in Fig. [H Thus we examined 25 combinations of antecedent linguistic values, 
each of which was presented to the trained neural network. In Table [I] we show 
extracted linguistic rules in the form of a 5 x 5 rule table. 

5 Some Extensions of Rule Extraction Method 

Our rule extraction method can be modified in various manners. The following 
points may be important issues to be discussed for improving our method: 

1. Coping with high-dimensional problems. 

2. Avoiding the increase in excess fuzziness in fuzzy outputs. 

3. Improving the understandability of extracted linguistic rules. 

The first issue is the handling of high-dimensional problems. In our rule extrac- 
tion method in the previous section, we examined all combinations of antecedent 
linguistic values. Such an exhaustive examination can not be performed in the 
case of high-dimensional problems due to the curse of dimensionality. One ap- 
proach to the handling of high-dimensional problems is to extract only general 
linguistic rules with a few antecedent conditions. Specific linguistic rules with 
long antecedent parts (i.e., many antecedent conditions) are not extracted in 
order to decrease the computational load. 

The second issue is the increase in excess fuzziness included in fuzzy outputs 
obtained by the feedforward calculation in neural networks. This undesirable 
phenomenon is the same as the increase in excess width in interval arithmetic. 
It is known in interval arithmetic that the excess width can be decreased by 
subdividing intervals into multiple subintervals [10]. Such a subdivision method 
can be utilized for decreasing the excess fuzziness included in fuzzy outputs from 
neural networks. In the numerical calculation of the fuzzy outputs, /i- level sets 
of linguistic inputs are subdivided into multiple subintervals. This is illustrated 
in Fig. 0 Since sharper fuzzy outputs are obtained by subdividing h-level sets 
of linguistic inputs as in Fig. El the consequent part of extracted linguistic rules 
can be specified more precisely. Table [H shows examples of extracted linguistic 
rules using a subdivision method in Fig. 0 (a). Linguistic rules in Tabled were 
extracted from the trained neural network used in the previous section. From 
the comparison between Table 0 with no subdivision and Table 1^ we can see 
that the fuzziness of the consequent linguistic values of the extracted rules was 
decreased by the subdivision method. 

The third issue is the understandability of extracted linguistic rules. It is a 
time-consuming task for human users to manually examine extracted linguistic 
rules when hundreds of rules are extracted. In this case, we can use genetic al- 
gorithms for selecting a small number of significant linguistic rules from a large 
number of extracted rules. GA-based rule selection methods have been proposed 
for selecting a small number of linguistic rules for pattern classification problems 
[11,12]. While those methods can not be directly applied to function approxi- 
mation problems, we can use the same coding method and genetic operations. 
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The definition of a fitness function and a rule generation procedure should be 
modified in the application to function approximation problems. Extracted lin- 
guistic rules from the trained neural network are used as candidate rules for the 
rule selection. A fitness function is defined by two terms: the fitting ability of 
selected rules to training data and the number of selected rules. 

X2 X2 





Fig. 4. Examples of subdivision methods, (a) Simple subdivision, (b) Hierarchical sub- 
division. 



Table 2. Extracted linguistic rules using a subdivision method. 
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ML 
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L 


MS 


MS 


MS 
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ML or L 


ML 
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S or MS 


MS or M 
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S or MS 


MS 


MS 
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S 
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S 


MS 


S 


S 


S 


S 


S 


MS 



6 Conclusion 

In this paper, we illustrated the bidirectional relation between neural networks 
and linguistic knowledge for function approximation problems. That is, we out- 
lined two research directions: the learning of neural networks from linguistic 
knowledge and the linguistic rule extraction from trained neural networks. We 
also suggested some extensions of our rule extraction method. The learning from 
linguistic knowledge and the linguistic rule extraction can be utilized to improve 
the transparency of neural networks, which are usually handled as black box 
models. The main characteristic of our approach is that neural networks are 
linguistically explained. 

In this paper, we described the bidirectional relation between linguistic knowl- 
edge and multi-layer feedforward neural networks with the sigmoidal activation 
function. Since RBF networks can be viewed as a kind of fuzzy rule-based sys- 
tems [13,14], this bidirectional relation is much more straightforward in RBF 
networks. That is, a single basis function can be viewed as a fuzzy if-then rule. 
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While such straightforward relation exists, the linguistic interpretation of each 
fuzzy if-then rule represented as a basis function is not always easy. 
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Abstract. We apply evolutionary computations to the Hoph eld’s neu- 
ral network model of associative memory. In the model, a number of 
patterns can be stored in the network as attractors if synaptic weights 
are determined appropriately. So far, we have explored weight space to 
search for the optimal weight conhguration that creates attractors at the 
location of patterns to be stored. In this paper, on the other hand, we 
explore pjattern space to search for attractors that are created by a hxed 
weight conhguration. All the solutions in this case are a priori known. 
The purpose of this paper is to study the ability of a niching genetic 
algorithm to locate these multiple solutions using the Hopheld model as 
a test function. 



1 INTRODUCTION 

Associative memory is a dynamical system which has a number of stable states 
with a domain of attraction around them (Komlos et al. 1988). If the system 
starts at any state in the domain, it will converge to the stable state. Hop- 
held (1982) proposed a fully connected neural network model of associative mem- 
ory in which information is stored by being distributed among neurons. 

The Hopheld model consists of N neurons and synapses. Each neuron 
state is either active (+1) or quiescent (—1). When an arbitrary W-bit bipolar 
pattern, a sequence of -fl and —1, is given to the network as an initial state, 
the dynamical behavior of neuron states afterwards are characterized by the 
strengths of the synapses. The synaptic strengths are called weights and the 
weight from neuron j to neuron i is denoted as Wij in this paper. Provided the 
synaptic weights are determined appropriately, network can store some number 
of patterns as attractors. Hopheld employed the so-called Hebbian rule (Hebb, 
1949) to prescribe the weights. That is, to store p bipolar patterns C- 
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the weight values are determined as: 

1^=1 

An instantaneous state of a neuron is updated asynchronously (one neuron at a 
time) as: 

WijSj(t) ^ , 

where Si(t) is a state of the i-th neuron at time t. If an initial state converges 
to one of the stored patterns as an equilibrium state, then the pattern is said 
to be recalled. Furthermore, if an initial state chosen from the stored patterns 
remains unchanged from the beginning, then the pattern is said to be stored as 
a fixed point. 

In analyzing the HopReld model, there have been basically two different ap- 
proaches: one is to explore pattern space searching for attractors under a specific 
weight configuration, and the other is to explore weight space searching for an 
appropriate weight configuration that stores a given set of patterns. To be more 
specific, the former is an analysis of the Hamiltonian energy as a function of 
all the possible configurations of bipolar pattern given to the network, where 
the synaptic weights are pre-specihed using a learning algorithm, usually the 
Hebb’s rule, so that the network stores a set of p given patterns. In this con- 
text, the model for p = I corresponds to the Mattis model of spin-glass (Mattis 
1976), in which the Hamiltonian energy has two minima, while the model for 
infinitely large p corresponds to the Sherrington-Kirkpatrick model (1975), in 
which the synaptic weights become Gaussian random variables. Analyses of the 
former type have been made in between these two extreme cases (see Amit, 
1989). The latter analysis was addressed by Gardner (1988). She discussed the 
optimal weight configurations for a fixed number of given patterns in terms of 
the volume of the solutions in weight space, suggesting that the volume shrinks 
to vanish when p approaches to 2N . In short, the former approach searches for 
the optimal pattern configurations which minimize the Hamiltonian energy in 
pattern space with the weights being fixed, while the latter searches for the weight 
configurations in weight space that optimally store a set of given fixed patterns. 

So far, we have studied the model with the latter approach. We have ex- 
plored fitness landscapes of the model defined on weight space, and have found 
many solutions that store more patterns or store them with larger basin of at- 
tractions than, e.g., the Hebbian synaptic weights (Imada et al. 1997a, 1997b). 
Now, our interest is on the number and distribution of these solutions over the 
whole weight space, which is still an open problem. We think the niching GA is 
one of the appropriate tools to pursue these problems. However, since the N^- 
dimensional continuous weight space is much more difficult to wander around 
than the At-dimensional rfzscrete pattern space, we explore here the pattern space 
to see preliminary how our fitness function works under the niching method. In 
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other words, we use the model as a test function of the niching technique in 
the sense that all solutions are a pnon known, like in other studies using pure 
mathematical test functions. 

Since we explore the htness landscape dehned on the pattern space in this 
paper, the htness function of an arbitrary conhguration of A?-bit bipolar pattern 
should be dehned. Here we evaluate the htness of a pattern according to how 
the instantaneous neurons’ states Si(t) after the pattern is given to the network, 
are similar to either of the stored patterns. The similarity as a function of time 
is dehned as: 

1 ^ 

8=1 

The temporal average of m^{i) is calculated for each stored pattern, and the 
maximum value among them is assigned to the input pattern as htness. Thus, 
the htness value / is evaluated as follows: 

^ ^0 

/ = max{ — | = 1, 2, • • • ,p}. 

° t=i 

In this paper, to is set to 2N , twice the number of neurons. Note that the htness 1 
implies the pattern is a hxed point attractor, while htness less than 1 includes 
many other cases. 

2 Niching Methods 

Niching genetic algorithm is a genetic algorithm (GA) (Holland 1975) that was 
devised to locate multiple solutions simultaneously. To do this, sharing (Gold- 
berg & Richardson 1987), for example, derates the htness value of each individ- 
ual using sharing function which rehects the number of similar individuals in the 
population. Or crowding (De Jong 1975) reduces the number of similar individu- 
als in the population by replacing some of the individuals with new individuals. 
Here, we employ the deterministic crowding GA (Mahfoud 1992) because of its 
niching capability (Mahfoud 1995) as well as the simplicity for implementation. 

In each generation, as neatly summarized by Mahfoud (1995), the current 
population is reproduced as follows. 

(1) Choose two parents, pi and p 2 , at random, with no parent being chosen more 
than once. 

(2) Produce two children, ci and C 2 , with uniform crossover (Syswerda 1989). 

(3) Mutate the children by hipping bit chosen at random with probability pm , 
yielding c( and c' 2 . 

(4) Replace parent with child as follows: 

• IF d{pi,c[) + d{p 2 ,cf) > d{pi,cf) + d{p 2 ,c[) 

* IF f{c'i) > f{pi) THEN replace pi with c( 

* IF /(c' 2 ) > fip'j) THEN replace p 2 with c '2 

• ELSE 
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* IF /(c' 2 ) > f{pi) THEN replace pi with c '2 

* IF f{ci) > f{p 2 ) THEN replace p 2 with c\ 

where d((iX 2 ) is the Hamming distance between two points (Ci,C 2 ) in pattern 
conhguration space. The process of producing child is repeated until all the 
population have taken part in the process. Then the cycle of reconstructing a 
new population and restarting the search is repeated until all the global optima 
are found or a set maximum number of generation has been reached. 

3 Experimental Results 

Experiments were carried out on networks with 49 neurons. For a given set of p 
patterns, a conhguration of synaptic weights in a network is calculated by the 
Hebb’s rule. Thus, the network with 49 neurons now stores these p patterns as 
hxed point attractors, so far as p dose not exceed the storage capacity. Namely, 
the htness landscape dehned on pattern space has p global peaks exactly at the 
location of the patterns. Then the deterministic crowding GA searches for these 
attractors. The parameters of the GA employed in this paper are as follows. The 
population size is 200; mutation probability pm is set to 0.05; and the number 
of iterations is limited to a maximum of 12,000 generations. 

To take the bird’s-eye view of the landscape, we picked up 240,000 samples 
randomly from pattern space. The number of samples was chosen to be equal 
to the typical number of function evaluations of our usual GA implementations. 
We observed that the htness values were distributed ranging from 0.01 to 0.95. 
At hrst glance, this broad distribution might seem that the search for the opti- 
mal solution(s) is not so difficult. In fact, Davis (1990) pointed out that some 
problems can be solved more easily with a simple hill-climber than genetic algo- 
rithms. So, we experimented a random hill climbing on the landscape. Starting at 
a randomly chosen position in the space, a point is mutated 200 times, and then 
moves to the highest htness point among mutants. We repeated a run varying p, 
but all we observed were early stagnations to a local minimum. The solutions of 
our problem were not reachable by a simple hill-climber, which satishes Whit- 
ley et al.’s (1995) demand that test functions should be resistant to simple hill 
climbing searches. 

Before applying the deterministic crowding, we also studied the problem with 
a simple GA where two parents are randomly selected from the best ht 40% of 
the population, and the worst 60% were replaced with the children produced, 
with other schemes such as recombination and mutation being the same as those 
described above. The simple GA founds easily the embedded patterns, but only 
one of them at a time. The typical result of the best and average htness versus 
generation obtained by the simple GA is shown in Figure 1. 

Next, we apply the deterministic crowding GA to this problem. We investi- 
gated how many of the embedded patterns can be located. For a specihc number 
of patterns p, starting with p = 2, we studied 30 runs with varying random 
number seed at the start of a run. The experiments are iterated with p being 
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Fig. 1. Best and average fitness versns generation obtained by a simple GA. 



incremented if any of the 30 runs locates all the embedded solutions. As results, 
we observed that when p = 2, 19 out of 30 runs locate all the embedded pat- 
terns; then as p increases, the number of successful runs decreases, i.e., when 
p = 3, 4, 5, 6 the number of successful trials out of 30 runs was 14, 8, 3, 0, re- 
spectively. Some examples of the number of individuals that converged on each 
of the given patterns when a GA run terminated are shown in Table 1. 



Table 1. Number of individuals converged to each attractor. 
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We can see that all the individuals eventually hud some different niches for 
p < 6. Then, how the number of individuals in each of these niches changes as an 
evolution proceeds? An example is shown in Figure 2. Also the total number of 
these located solutions are shown in Figure 3, together with a result of a simple 
GA. The Rgure shows that all the individuals in the deterministic crowding GA 
converge to any of the solutions, while the simple GA reaches about half of the 
solutions. 




Fig. 2. Number of individuals converged on each niche. 




Fig. 3. Number of individuals that reach solutions. 
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4 Conclusion 

We have described an application of the deterministic crowding GA, one of 
multi-modal function optimization techniques, to the Hopheld model of associa- 
tive memory. What we concern is to obtain multiple solutions of weight conhg- 
uration in weight space. We think the deterministic crowding GA is one of the 
candidates for the purpose. As a preliminary stage, the Rtness landscape defined 
on pattern space instead of weight space was explored in this paper. As expected, 
we observed that convergence behaviors of the deterministic crowding GA were 
different from those of a simple GA. The deterministic crowding GA converges 
to multiple solutions, while a simple GA converges to one of these solutions. The 
number of solutions located by the deterministic crowding GA depends on trials. 
When more than 6 patterns are given, the deterministic crowding GA located 
only a part of them One possible reason for these results is that the number of 
embedded patterns exceeds the capability of the deterministic crowding GA to 
make niches on all of them. The other possibility might be due to our fitness 
evaluation. We are now experimenting with the Hamiltonian energy function for 
fitness evaluation. 

The approach we have taken in this paper, that is, the exploration in pattern 
space to search for solutions, can be extended in the problem of searching for 
the solutions in weight space. Needless to say, much remains to be done. The 
simulations in this paper were made under specific limitations, such as low di- 
mensionality and low cardinality of genes. Under experiments with the network 
with 49 neurons, the search space to be explored is vertices of the 49-dimensional 
hyper cube, while the weight space to be explored will be continuous entire region 
inside the 2401-dimensional hyper cube. 

Although the analysis of the fully-connected network model of associative 
memory is rather classical, many issues are still unknown. Our goal of applying 
multi-modal-GA to the model is to address one of these issues, that is, the distri- 
bution of solutions in weight space. We believe these studies using evolutionary 
algorithms shed new light on the analysis of the model. 
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Abstract. Negatively correlated neural networks (NCNNs) have been 
proposed to design neural network (NN) ensembles [I]. The idea of NC- 
NNs is to encourage different individual NNs in the ensemble to learn 
different parts or aspects of a training data so that the ensemble can 
learn the whole training data better. The cooperation and specialisation 
among different individual NNs are considered during the individual NN 
design. This provides an opportunity for different NNs to interact with 
each other and to specialise. In this paper, NCNNs are applied to two 
time series prediction problems (i.e., the Mackey-Glass differential equa- 
tion and the chlorophyll-a prediction in Lake Kasumigaura) . The exper- 
imental results show that NCNNs can produce NN ensembles with good 
generalisation ability. 



1 Introduction 

Many real-world problems are too large and too complex for a single monolithic 
system to solve alone. There are many examples from both natural and artificial 
systems which show that a composite system consisting of several subsystems 
can reduce the total complexity of the system while solving a difficult problem 
satisfactorily. The success of neural network (NN) ensembles in improving clas- 
sifier’s generalisation is a typical example |5]. However, designing NN ensembles 
is a very difficult task. It relies heavily on human experts and prior knowledge 
about the problem. This paper describes a cooperative learning algorithm which 
can create negatively correlated neural networks (NCNNs) automatically j1 

The idea behind NCNNs is to encourage different individual networks to 
learn different parts or aspects of a training data so that the whole system can 
learn the training data better. NCNNs are trained simultaneously rather than 
independently or sequentially. Simultaneous training provides an opportunity for 
individual NNs to cooperate and specialise. 

NCNNs have been tested on a number of benchmark problems, including re- 
gression and classification problems m- In all these problems, both the input 
and the output are independent of time. This paper describes NCNNs’ applica- 
tion to the time series prediction problems, where the input and output change 
over time. It is assumed that the appropriate response at a particular point in 
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time depends not only on the current input, but also on previous inputs. One 
artificial and one real-world problems are used in this paper to evaluate the effec- 
tiveness and efficiency of NCNNs. The experimental results obtained by NCNNs 
are better than those obtained by other systems in terms of prediction error. 
They also illustrate that NCNNs are applicable to a wide range of problems, 
regardless of whether the input and output are static or time-varying. 

The rest of this paper is organised as follows: Section briefly describes 
NCNNs. Section presents the experimental results of NCNNs and discussions. 
Finally, Section [4| concludes with a summary of the paper and a few remarks. 



2 Negative Correlation Learning 



The negative correlation learning has been successfully applied to NN ensembles 
m for creating NCNNs. The idea of negative correlation learning is to introduce 
a correlation penalty term into the error function of each individual network in 
a NN ensemble so that the individual network can be trained simultaneously 
and interactively. The error function Ei for individual i in negative correlation 
learning is defined by 






_ ^ 



Ei{n) 

^(Ei{n) - y{n)f + Xp,{n) 



( 1 ) 



where N is the number of training patterns, Ei{n) is the value of the error 
function of network i at presentation of the nth training pattern, Fi{n) is the 
output of network i on the nth training pattern, and y(n) is the desired output of 
the nth training pattern. The first term in the right side of Eq. © is the empirical 
risk function of network i. The second term pi is a correlation penalty function. 
The purpose of minimising pi is to negatively correlate each individual’s error 
with errors for the rest of the ensemble. The parameter 0 < A < 1 is used to 
adjust the strength of the penalty. 

The penalty function pi has the form: 



p,{n) = {F,{n) - F{n)) {Fj{n) - F{n)) (2) 

where F{n) = Eff.^Fi{rL) is the output of the NN ensemble on the nth training 
pattern. 

The partial derivative of Ei with respect to the output of network i on nth 
training pattern is 



dEi{n) 

dF,{n) 



Fi{n) — d{n) XSj^i {Fj{n) — F{n)) 
Fi{n) - d{n) - X{Fi{n) - F{n)) 



( 3 ) 
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where we have made use of the assumption that F{n) has constant value with 
respect to Fi(n). The back-propagation (BP) algorithm has been used for weight 
adjustments in the mode of pattern-by-pattern updating. That is, weight updat- 
ing of all the individual networks is performed simultaneously using Eq. after 
the presentation of each training case. One complete presentation of the entire 
training set during the learning process is called an epoch. The negative corre- 
lation learning from Eq. © is a simple extension to the standard BP algorithm. 
In fact, the only modification needed is to calculate an extra term of the form 
\{Fi{n) — F{n)) for ith NN. 

From Eqs. 0, di, and d^, we may make the following observations: 

1. During the training process, all the individual networks interact with each 
other through their penalty terms in the error functions. Each network Fi 
minimises the difference between Fi(n) and y{n) while maxmising the differ- 
ence between Fi{n) and F(n). That is, negative correlation learning considers 
errors what all other NNs have learned while training a NN. 

2. For A = 0.0, there are no correlation penalty terms in the error functions 
of the individual networks, and the individual networks are just trained 
independently using BP. That is, independent training using BP for the 
individual networks is a special case of negative correlation learning. 

3. For A = 1, from Eq. m we get 

= (4) 

Note that the empirical risk function of the ensemble for nth training pattern 
is defined by 

Eensemble = - d{n)f (5) 

The partial derivative of Eensemble with respect to Fi on nth training pattern 
is 

T(r“iF,(n)-d(n)) 

T(i?(n) - y(n)) (6) 

In this case, we get 

dEii^Jl) dEensemhle /y\ 

dF,{n) dF,{n) ^ ^ 

The minimisation of the empirical risk function of the ensemble is achieved 
by minimising the error functions of the individual networks. From this point 
of view, negative correlation learning provides a novel way to decompose the 
learning task of the ensemble into a number of subtasks for each individual. 



dE, 



ensemble 

dF,{n) 
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3 Experimental Studies 

3.1 The MacKey- Glass Time Series Prediction Problem 

The MacKey-Glass time series investigated here is generated by the following 
differential equation 

. / N ^ / N axit — t) 

+ i + _ rj 

where a = 0.2, /3 = —0.1, r = 17 [41 5 j . As mentioned by Martinetz et al. [B], 
x{t) is quasi-periodic and chaotic with a fractal attractor dimension 2.1 for the 
above parameters. 



Experimental setup The input consists of four past data points, x{t), x{t — 6), 
x{t — 12) and x{t — 18). The output is x{t -|- 6). In order to make multiple step 
prediction (i.e.. At = 90) during testing, iterative predictions of x{t+6), x{t+12), 
..., o;(t-|-90) will be made. During training, the true value of a;(t-|-6) was used as 
the target value. Such experimental setup is the same as that used by Martinetz 
et al. [B]. 

In the following experiments, the data for the MacKey-Glass time series 
was obtained by applying the fourth-order Runge-Kutta method to Eq. (0 with 
initial condition x(0) = 1.2, x{t — r) = 0 for 0 < t < r, and the time step is 1. 
The training set consisted of point 118 to 617 (i.e., 500 training patterns). The 
following 500 data points (starting from point 618) were used as the testing set. 
The values of training and testing data were rescaled linearly to between 0.1 and 
0.9. Such experimental setup was adopted in order to facilitate comparison with 
other existing work. 

The normalised root-mean-square (RMS) error E was used to evaluate the 
performance of NGNNs, which is determined by the RMS value of the absolute 
prediction error for At = 6, divided by the standard deviation of x{t) [4l6j . 

^ ^ {[Xpredit, At) - X{t + At)]‘^)i 

{{x-{x)r)i 



where Xpred(t, At) is the prediction of x{t + At) from the current state x{t) and 
(x) represents the expectation of x. As indicated by Farmer and Sidorowich [1], 
“If E = 0, the predictions are perfect; E = 1 indicates that the performance is 
no better than a constant predictor Xpred{t, At) = (x).” 

The ensemble architecture used in the experiments has 20 individual net- 
works. Each individual network is a feedforward NN with one hidden layer. 
Both hidden node function and output node function are defined by the logistic 
function 



‘fiy) 



1 

1 -k exp (y) 



( 10 ) 



All the individual networks have 6 hidden nodes. The number of training epochs 
was set to 10000. The strength parameter A was set to 1.0. 
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Experimental Results and Comparisons Table [TJshows the average results 
of NCNNs over 50 runs. Each run of NCNNs was from different initial weights. 
Figjl] shows the best results of NCNNs on the training and testing set. Table |2] 
compares NCNNs’ results with those produced by EPNet [7|, BP learning and 
the cascade-correlation (CC) learning [8|. It is obvious that NCNNs are able to 
achieve the generalization performance better than that of others. 




Fig. 1. The Mackey-Glass time series prediction problem. The system’s and the best 
NCNNs’ outputs on the training set (left). The system’s and the best NCNNs’ outputs 
on the testing set (right). The time span is At = 6. 



For a large time span At = 90, NCNNs’ results also compare favorably with 
those produced by Martinetz et al. which had been shown to be better than 
Moody and Darken j^. For the same training set size of 500 data points, the 
smallest prediction error achieved by “neural-gas” networks was about 0.06. 
The smallest prediction error among 50 runs of NCNNs was 0.0305, while the 
average prediction error was 0.0458. 





Testing RMS 


Mean 


Std Dev 


Min 


Max 


At = 6 


0.0100 


0.0006 


0.0090 


0.0116 


At = 84 


0.0326 


0.0033 


0.0273 


0.0400 


At = 90 


0.0367 


0.0038 


0.0305 


0.0458 



Table 1. The average results produced by negative correlation learning over 50 runs 
for the Mackey-Glass time series prediction problem. The “Testing RMS” in the table 
refers to the error defined by Eq. on the testing set. 
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NGNNs 


EPNet 


BP 


GG Learning 


At = 6 


0.01 


0.02 


0.02 


0.06 


00 

II 


0.03 


0.06 


0.05 


0.32 



Table 2. Generalisation errors comparison among NCNNs, EPNet[^, and BP learning 
and the cascade-correlation (CC) learning[^ for the Mackey-Glass time series predic- 
tion problem. The generalisation error refers to the error defined by Eq. 0 on the 
testing set. 



3.2 Chlorophyll-a Prediction 

Recknagel has recently proposed to use feed-forward NNs to predict chloro- 
phyll-a in Lake Kasumigaura. The experimental results reported by Recknagel 
m were very promising, although more improvements can be made. 



Experimental setup In order to compare our results with previous work, we 
have followed as closely as possible the previous experimental setup described in 
m- The limnological time series for 10 years between 1984 and 1993, inclusively, 
were used in our experiments. The experiment was divided into two parts. The 
first part used the data in 1984 and 1985 as the training data to train the first 
NN ensemble. Then the NN ensemble was tested on the 1986 data. The second 
part of the experiment used the data between 1987 and 1992 as the training 
data to train the second NN ensemble. Then the NN ensemble was tested on the 
1993 data. Using the 1986 and 1993 data as the independent testing data was 
suggested by Recknagel P0 because they represent typical years for blooms of 
Microcystis and Oscillatoria, respectively. More details about these data were 
given in P0. 

As a preprocessing step, the original data were rescaled linearly to between 
0.1 and 0.9. The input to each individual network consists current 8 input condi- 
tions and output conditions in the past seven days. It should be pointed out that 
Recknagel m used a 5- vector input layer which included the current input con- 
ditions and those input conditions of present 10, 20, 30, and 40 days previously. 
The reasons for changing input are to decrease the number of input attributes 
and make the Chlorophyll-a prediction problem more meaningful in the sense of 
time series prediction. 

The normalised root-mean-square (RMS) error E defined in Eq.0 for At = 
1 was used to evaluate the performance of negative correlation learning. The 
ensemble architecture used in the experiments has 20 individual networks. Each 
individual network is a feedforward NN with one hidden layer. Both hidden node 
function and output node function are defined by the logistic function in Eq. (P0. 
All the individual networks have 6 hidden nodes. The number of training epochs 
was set to 2000 for the first part of the experiment, and 1000 for the second part 
of the experiment. The strength parameter was set to 1.0. 
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Experimental Results Table shows the average results of NCNNs over 50 
runs for the chlorophyll-a prediction problem. Each run of NCNNs was from dif- 
ferent initial weights. Figl^shows the best predictions of NCNNs along with the 
observed values. As can be seen, the predictions of NCNNs are remarkably accu- 
rate except the peak magnitudes in 1986 were slightly underestimated. NCNNs 
also outperformed the standard feedforward NNs for the chlorophyll-a prediction 
problem [TOl . 



Year 


Mean 


Std Dev 


Min 


Max 


1983 - 1984 


0.0104 


0.0003 


0.0099 


0.0111 


1986 


0.0812 


0.0044 


0.0740 


0.0902 


1987 - 1992 


0.0213 


0.0005 


0.0206 


0.0228 


1993 


0.1051 


0.0016 


0.1025 


0.1094 



Table 3. The average results of RMS errors produced by NCNNs over 50 runs for the 
chlorophyll-a prediction in Lake Kasumigaura. 




1984 — 1985 




1987 — 1992 




1986 




1993 



Fig. 2. The observed outputs and the best NCNNs’ outputs for the chlorophyll-a pre- 
diction in Lake Kasumigaura. The time span is At = 1. 
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4 Conclusions 

This paper introduces NCNNs and applies NCNNs to two time series predic- 
tion problems, i.e., the Mackey-Glass differential equation and the chlorophyll-a 
prediction in Lake Kasumigaura. Accurate prediction of chlorophyll-a and other 
blue-green algae in fresh water lakes can provide ecologists and environmen- 
talists with valuable information for controlling major outbreaks of these algae 
and protecting the environment. Very good results were obtained by NCNNs in 
comparison with other algorithms. 

NCNNs provide a very promising and competitive alternative to designing 
NN ensembles manually. However, no special considerations were made in the 
optimisation of the size of the ensemble in this paper. It would be desirable 
to develop a learning algorithm which can vary the ensemble size dynamically. 
Preliminary work on this research has already started. 
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Abstract. This paper reports the work on the development of an an- 
imation system for visualising the optimisation process of the Genetic 
Algorithm. The description on the requirements and structure of the 
system is presented. The developed system is applied to visualise some 
six testing cases. Sequences of animation shots of the evolution pro- 
cess for solving the Branin RGOS problem and the Schaffer-6 problem 
are presented. In the latter example, the effect of a solution acceleration 
technique is also demonstrated. The method of building the visualisation 
system can be applied to other evolutionary computation techniques. 

Keywords: Genetic Algorithm, animation, graphic user interface (GUI), 
optimisation 



1 Introduction 

In the areas of science, engineering and economics, there are many optimisation 
problems, which are highly non-convex and their global optimum solutions are 
required. Owing to their ability in seeking the global or near-global optimum 
solution, Genetic Algorithms (GAs) PH have been applied to solving optimisa- 
tion problems in image processing [321: VLSI design js], robotics systems 021: 
transportation 132 ]: and power engineering |10I11II2I13| . 

Genetic algorithms (GAs) are an adaptive searching technique in the field of 
evolutionary computation based on the mechanics of natural genetics and natu- 
ral selection. The success of the application of GAs to an optimisation problem 
depends on the design of (a) representation of chromosomes, (b) fitness function, 
(c) method of crossover, (d) mutation operation and (e) measures and techniques 
to ensure robustness of the GA evolution process for the determination of the 
optimum solution. Besides these factors, the performances of GAs depend on 
the parameter settings of the population size, crossover rate and mutation rate. 

To ease the design of the components above and to examine the effectiveness 
of the measures and techniques in (e), a visualisation tool will be very helpful. 
In such a tool, the evolution processes of GAs need to be animated. While the 

X. Yao et al. (Eds.): SEAL’98, LNCS 1585, pp. 341-[348] 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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evolution processes of GAs are difficult to predict particularly during the de- 
velopment phase of the GA-based optimisation algorithms, visualisation of the 
evolution processes will make the processes transparent. Owing to the trans- 
parency provided, deeper insight to the performance of the algorithm under 
development can be obtained and hence the development time can be shortened. 

The animation of the GA evolution process will give aids to determine the ap- 
propriateness of the values of the parameter settings, or the appropriate points 
in the process on which settings of the parameters need to be changed. Fur- 
thermore, it will provide a better insight to the distribution and diversity of 
chromosomes in the population. 

While work PI on a two dimensional data matrix representation of the search 
space for visualisation purposes has recently been reported, this paper is de- 
voted to the development of an animation system for visualising the evolution 
of chromosomes during the evolution process. It describes the requirements and 
structure of the system. The developed system is applied to visualise some six 
testing cases. Sequences of animation shots of the evolution process for opti- 
mising the Branin RGOS problem m and the Schaffer-6 problem PI are pre- 
sented. In the latter example, the effect of a solution acceleration technique is also 
demonstrated. The method of building the visualisation system can be applied 
to optimisation process based on other evolutionary computation techniques. 



2 Requirements of animating system 

An animating system for the GA evolutionary process should be able to display: 

— population of chromosomes; 

— distribution of the chromosomes; 

— display of search space in two- and three-dimensional space for up-to three 
variable problems; 

— sequential change in the distribution of the chromosomes in a population 
from one generation to the next; 

— crossover action in forming a child chromosome; 

— mutation action in forming a child chromosome; 

— convergence characteristics of the GA optimisation process; 

— background information and relevant terminology. 

Moreover, an animating system should have the ability to allow interactive set- 
ting of the parameters for executing the evolution process. 



3 Design and Structure of the GA Animation System 

A system called the Genetic Algorithm Animation System (GAAS) has been 
designed and developed to fulfil the requirements in Section 2. GAAS is built 
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using the high performance language MATLAB (version 5.2), a sophisticated 
software package for numeric computation and data visualization. By making use 
of sophisticated graphic user interfaces (GUIs) provided by MATLAB, GAAS is 
designed to provide a visually rich environment for conveying information, al- 
lowing more intimate interaction between users and machines. Rather than the 
one-way path from keyboard to video screen, GAAS allows the user to interact 
directly with the objects on the display. The overall structure of the GAAS is 
shown in Fig. Q] 



main window 



▼ 

parameters 

setting 




info window & 
tutonats 



search space 



▼ 

animation 

selection 



perlorm GA 
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(0 



crossover 

statistics 



mutation 






statistics 







selection 






statistics * 







convergence ^ 
characteristics 



Fig. 1. Block diagram of Genetic Algorithm Animation System 



With reference to Fig. [T] when GAAS is initiated, a main window will be dis- 
played. From the main window, the user is allowed to (a) set parameters, (b) 
select a test case, (c) select the dimension of the search space, (d) access the 
information on selected test cases, (e) access the tutorials on GA, (f) select the 
display on genetic operations and convergence characteristics, and (g) execute 
the animation process. Fig. Elgives an actual GAAS display of the main window, 
the parameter setting control panel, test case indicator and menu bar. 

GAAS is designed and implemented following a hierarchical structure shown 
in FigOl In GAAS, the main window is the Root Figure object depicted in 
FigjS] As shown in Fig. |2] the main window is responsible for taking parameter 
setting, executing numerical calculation and graphical animation, and initiating 
the display for tutorials and information. There are a number of key objects 
developed and built into the main window. The key objects include: 

— 2D {Radio button): Enable or disable the display for 2D search space; 

— 3D {Radio button): Enable or disable the display for 3D search space; 

— Acceleration {Check box): Enable or disable the acceleration technique; 
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File Edit Window Help Animelions 




Fig. 2. A sample display of the Genetic Algorithm Animation System 
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Fig. 3. Hierarchical Structure used in GASS Design 



— Animations {User interface menu): Allow to select pop-up windows for 
animating crossover, mutation and selection operations. The animation of 
convergence behavior can also be selected from this object; 

— Animated computation {Radio button): Enable or disable animation; 

— Case Selection {Pop-up menu): Provide 6 build-in test cases including De 
Jong function 1, De Jong function 2, Schaffer function 6, Schaffer function 
7, Six-hump camel back function and Branin RCOS function; 

— Elitism {Check box): Enable or disable the elitism scheme; 

— Info {Push button): Invoke a pop-up window describing selected test case; 

— Tutorials {User interface menu): Allow to select pop-up windows for tuto- 
rials on Genetic Algorithms and related technical jargons. 
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Four pop-up windows have been designed and implemented for animating key 
genetic operations. These four windows are Crossover Animation, Mutation An- 
imation, Selection Animation and Convergence Animation. A pop-up help win- 
dow is also provided and can be invoked from Tutorials menu to display the 
definition for relevant evolutionary computation terms. 

In the implementation of GAAS, the object-oriented programming methodol- 
ogy has been used. The implementation details are omitted here. 



4 Application Examples 

GAAS has been applied to visualise the evolution process of the Standard GA 
(SGA) in optimising some well-known functions given in 1151 . The authors have 
previously developed a solution acceleration method for improving the perfor- 
mance of SGA. The acceleration method can be found in [1^ and it is the acceler- 
ation scheme (a) described in Reference m- This method has been incorporated 
into the SGA program given in Reference m to form the accelerated GA and 
has also been applied to optimise the functions mentioned above. This section 
presents the animation shots in the optimisation processes when searching so- 
lutions for the Branin RGOS problem. To illustrate the effect of acceleration, 
the animation results obtained by GAAS for the Schaffer-6 problem are also 
presented. 

(i) Solution searching for the Branin RCOS problem 

The objective function, given by Eq. dD, of Branin RGOS problem has the 
global minimum value of 0.397887 at (xi,X 2 ) = (—tt, 12.275), (tt, 2.275), or 
(9.425,2.475). 



F{xi,X2) = (X2 - “ 6)^ -I- 10(1 - ^)cos(xi) -I- 10 (1) 

The chromosomes in the initial population spread widely over the search space 
in terms of the xi~ and X 2 ~ axes as shown in Fig. 01(a). However, it can be ob- 
served that there are two initial chromosomes located very close to the third 
solution point near the bottom- left corner of the figure. This visualisation indi- 
cates that the evolution process may be attracted to the third solution point. 
The distribution of the chromosomes in the 2nd generation in Fig. |D(b) shows 
that the chromosomes are attracted to the second and the third solution points. 
A further generation sees the solution process evolves towards the third solu- 
tion points as shown in Fig.0|(c). The third optimal solution is obtained in the 
7th generation. The chromosome distribution at the 20th generation is shown 
in Fig. [H(d) giving confirmation to the optimal solution. To capture all the op- 
timal solutions, clustering algorithms can be included into the evolution process. 
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Statistics of the number of mutation performed and the number of times a chro- 
mosome has been selected for crossover in the 20th generation are displayed as 
shown in Figs. Ue) and (f). In the Fig. [3Ke), a bar represents a chromosome 
and a black bar represents a mutated chromosome. In Fig. Sf), the digits in 
the odd columns give the index of a chromosome. In the even columns, the digit 
summarised the number of times the chromosome has been selected for crossover 
while the symbol ’x’ indicates no selection. 




(a) initial generation 



tu; <:mu yoimtcuiuii' 








Fig. 4. Evolution process and statistics in solving 
the Branin RCOS problem 



Fig. 5. 

Convergence Characteristic 



Fig. O shows the convergence characteristics of the process. From the statistics 
of chromosome selection, crossover and mutation given in the animation widows 
such as shown in Fig. 01 and the convergence characteristic given in Fig. 01 the 
user will be able to examine the effectiveness of the evolutionary process at any 
generation and hence evaluate the appropriateness of the settings of the GA 
parameters and the performances of the various genetic operations. 

(ii) Solution searching for the Schaffer-6 problem 

To demonstrate the visualisation of the acceleration effect in IT6IT71 , the Schaffer- 
6 problem is solved here. The objective function is given in eqn. © and the global 
minimum value is zero at {xi,X 2 ) = (0,0). 



F{xi,X2) = 0.5 -I- - 



sin^{\/x‘l X 2 ) — 0.5 



(l-k0.001(a;f-ka;i)) 



( 2 ) 



The evolution of a population of chromosomes through the first four generations 
with and without the acceleration scheme is shown in Figs. 0] and [71 In these 
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figures, the vertical axis is for the value of the objective function and the other 
two axes are for xi and X 2 - Without the acceleration, as shown in Figs. |6|a)- 
(d), the chromosomes hop around in the searching space without getting close 
to the optimum point. However, in Figs. [7Ka)-(d) when solution acceleration is 
enabled, the chromosomes move rapidly towards the optimum point immediately 
after the 1st generation, the optimum point is almost reached at the 4th gener- 
ation by the fittest chromosome. The same phenomenon was observed when the 
problem was solved by separate executions of the accelerated GA algorithm. 





(a) root generation (D) 2no generation (c) 3ro generation (O) 4tn generation 

Fig. 7. Evolution process with acceleration in solving the SchafFer-6 problem 



5 Conclusions 

A GA animation system, GAAS, has been designed and developed using the 
object-oriented programming methodology. The structure and facilities of GAAS 
have been reported. The multiple- window displays that the system provides en- 
able a clear visualisation of the evolution process of GA and statistics of some 
of the important GA operations. Two optimisation problems, Branin RGOS and 
SchafFer-6, have been used to demonstrate the usefulness of the developed ani- 
mation system. This system will be very useful for education purposes and for 
advancing the research into robust evolutionary algorithms. Further development 
of GAAS is currently undertaken to include more functions and facilities. 
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Abstract. In our former paper, we have investigated the relation among 
the mean convergence time, the population size, and the chromosome 
length of genetic algorithms (GAs). Our analyses of GAs make use of the 
Markov chain formalism based on the Wright-Fisher model, which is a 
typical and well-known model in population genetics. The Wright-Fisher 
model is characterized by 1-locus, 2-alleles, fixed population size, and 
discrete generation. For these simple characters, it is easy to evaluate 
the behavior of genetic process. We have also given the mean conver- 
gence time under genetic drift. Genetic drift can be well described in the 
Wright-Fisher model, and we have determined the stationary states of 
the corresponding Markov chain model and the mean convergence time to 
reach one of these stationary states. The island model is also well-known 
model in population genetics, and it is similar to one of the most typical 
model of parallel GAs, which require parallel computer for high perfor- 
mance computing. We have also derived the most effective migration rate 
for the island model parallel GAs with some restrictions. The obtained 
most effective migration rate is rather small value, i.e. one immigrant 
per generation, however the behaviors of the island model parallel GAs 
at that migration rate are not revealed yet clearly. In this paper, we dis- 
cuss the mean convergence time for the island model parallel GAs from 
both of exact solution and numerical simulation. As expected from the 
Wright-Fisher model’s analysis, the mean convergence time of the island 
model parallel GAs is proportional to population size, and the coefficient 
is larger with smaller migration rate. Since to keep the diversity in popu- 
lation is important for effective performance of GAs, the convergence in 
population gives a bad influence for GAs. On the other hand, mutation 
and crossover operation prevent converging in GAs population. Because 
of the small migration rate makes converging force weak, it must be ef- 
fective for GAs. This means that the island model parallel GAs is more 
efficient not only to use large population size with parallel computers, 
but also to keep the diversity in population, than usual GAs. 

Keywords: Markov chain, population genetics, genetic drift, Wright- 
Fisher model, island model parallel genetic algorithms. 
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1 Introduction 



Genetic Algorithms (GAs) are adaptive methods based on the genetic processes 
of biological organisms which were introduced by Holland |B]. They successed 
to solve many problems of search, optimization, and machine learning [4]. It 
is natural to study behaviors of GAs theoretically, because we want to know 
performances of GAs compared with other methods and that how GAs converge 
good solutions; it is expected that situations are different from random search 
methods. Then many researchers have studied GAs theoretically mmm- One 
of the interesting topics in the finite population of GAs is the genetic drift. 

In population genetics and GAs, genetic drift is well known phenomenon. In 
B01Z], genetic drift has been studied with computer simulations. Kimura gave 
the mathematical analysis for the population genetics through diffusion models 

m- 

In m, we derived the most efficient mutation rate for standard GAs and 
the most efficient migration rate for island model parallel GAs. In this paper, we 
discuss the mean convergence time for the island model parallel GAs. This paper 
is organized as follows. Section [2] is a review of our former researches [in][IIl in 
order to understand the later sections, and it is devoted to analyze the mean 
convergence time and the stationary state on the Wright-Fisher model which is 
a model of simple GAs. In section [3] we consider the island model parallel GAs, 
which is often used in parallel GAs. For the details of our analyses on the Markov 
chain model, see m- 



2 Genetic Drifts and the Wright-Fisher Model 



Let the population consist of fixed n individuals which have only one locus, 
in other words, the length of chromosome is one for each. There are only two 
different alleles (‘0’ and ‘1’) in the locus, and the state is denoted as the number 
of ‘I’s in the population. This is the well known genetic model as “Wright-Fisher 
model” in biological population genetics field (e.g. 0). 

Genetic drift is the random fluctuation of gene frequencies subjected by prob- 
abilistic transition from generation to generation in finite population size. It 
tends to localize genes to particular genes (convergent states of Markov chain) . 
This tendency is against mutation, which makes genes disperse to various genes. 
In this section, we consider the mean convergence time of the Wright-Fisher 
model without mutation using standard Markov chain analysis. 

In this case, the convergence of the Wright-Fisher model is driven by genetic 
drift. The model has only two alleles, i.e. 0 and 1. In general case, i.e. more than 
two alleles, we pick up a particular allele to use the Wright-Fisher model. If a 
total population size is n, then the state of the population is uniquely specified 
by the number of Os, so we define the state as the number of Os. 
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2.1 Mean Convergence Time for the Wright-Fisher Model 

The mean convergence time of simple GAs is proportional to the population size. 

In [T] [7] , the effects of genetic drift to the mean convergence time have been 

studied with numerical experiments. It is shown that the mean convergence time 
is proportional to the population size of a model. Theoreticaly, we can show, in 
the continuous limit such as the large population limit, the mean convergence 
time is proportional to the size of population. According to Kimura jS] with some 
tedious calculations, we have the following relation on the mean convergence time 

t; 

, - 2p) - P2j+2(1 - 2p) 

(j+i)(2j+i) 

where P {•) are the Legendre polynomials, and p is the gene frequency at the 
initial state. This shows that the mean convergence time is proportional to the 
population size. It is too difficult to explain the derivation of eq. (P , see [HI and 
m for detail. As shown in table |T] the theoretical values from eq. coincide 
with the results from the numerical experiments [T]. Note that the right hand 
side of eq. o is equal to — 2{plogp+ (1 — p)log(l — p)}n. 



initial state 


theoretical analysis 
from eq.lfT) 


numerical analysis 
from P 


p=l/2 


1.386n 


1.4n 


p = 1/4 


1.125n 


l.On 


p = l/8 


0.754n 


0.7n 



Table 1. Mean convergence time at large population 



2.2 Stationary State for the Wright-Fisher Model 

Here we consider the Wright-Fisher model with mutation. Then the transition 
matrix Q, each element of which is transition probability from state i to state 
j, is given as 



Qij 





l-2p 




1 - 2/i 



n-j 



( 2 ) 



Since the largest eigenvalue of Q is 1, we get the density function of the sta- 
tionary state by normalizing the eigenvector for the eigenvalue 1 |5] . Making the 
population size n the infinity, the density function of stationary state is given as 



Pjinp) ^ 

{F(2nM)}nnV njj 



( 3 ) 



Figure [T] shows the shape of the density function of stationary state in the 
large population limit. From the eq.®, we see that the mutation rate P — 
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Fig. 1. The density function of the stationary states: the large population limit 
r(in ) -i 

r(2n ) 2 \ n) } 



makes the density function flat. This shows that this mutation rate makes the 
GAs work well. Because, if the mutation rate is large, GAs are hard to get 
stationary results, and if the mutation rate is small GAs become easy to converge 
to a certain value which might not be an optimal result. When the mutation rate 
p, is ^ all of the states have the same probability. At that mutation rate, the 
density function could be expected to take the same shape as the considered 
fitness function. 

In this consideration, we didn’t take the influence of the chromosome length 
into account. This is because, the gene of each locus behaves like described 
here. Furthermore, since the reciprocal effect spanning plural loci, i.e. epistasis, 
depends on the problem to be solved, we cannot describe it without knowing the 
fitness function. But the mutation rate M ^ could be a standard value. 

3 Island Model Parallel GAs 

Parallel GAs have been investigated since GAs were introduced. The island 
model parallel GAs are typical models of parallel GAs [IS] . For the island model 
parallel GAs, the total population is divided into several subpopulations, and one 
processor is allocated to each subpopulation. Each processor is engaged to run 
the simple GA independently. Inter-processor communication occurs during the 
migration phase at regular intervals (i.e. migration interval). During migration, 
a fixed rate of each subpopulation is selected and sent to another subpopulation. 
In return, the same number of migrants are received and replace individuals 
selected according to some criteria. 
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3.1 Stationary State for the Island Model 

In the limit of subpopulation size n tending to infinity, the density function of 
stationary state is given by [S]; 

r(2„„.) nr"-' (1 - 

r{2nmx)r{2nm{l — x)) ’ 

where n is the number of individuals in a subpopulation, m is migration rate or 
the ratio of migrants for each generation in subpopulation, and x is the mean 
value of ^ of whole populations. Figure 0 shows the density function of the 
stationary states. 




Fig. 2. The density function of the stationary states: the large population limit of 
island model parallel GAs, where subpopulation size is n, and the mean frequency x 

r(2nm) 



of whole population is } 



From the eq.(E|), the migration rate m is twice large as mutation rate ^ of 
the eq. when X is ^ . That means the migration rate m = ^ makes the density 
function uniform, similar to the case of /r = This implies that the migration 
rate m = 4 which means one migrant per generation, makes parallel GAs work 
well. This situation is similar to the case of the mutation rate of standard GAs. 
However, there are some differences between them; First, mutation is ignored 
in the eq. 0. Second, we assume x (the mean value of ^ of whole populations) 
is Because of these differences, our expectation would have a little error. In 
fact, the migration rates used in the several researches on parallel GAs are larger 
than 

n 

Although Manderick et aZ.|2] tried to determine the most effective migration 
rate, they could not determine it. Because a small difference of migration rate 
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does not make a meaningful deference in the performance of the island model 
parallel GAs. Since the smaller migration rate gives less communication overhead 
cost, we expect the migration rate ^ might drive the island model parallel GAs 
effectively. 

3.2 Mean Convergence Time for the Island Model 

For the simple model, i.e. two islands case, we can evaluate the mean conver- 
gence time depending on its population size and migration rate by the numerical 
simulation. As well as the Wright-Fisher model, the mean convergence time is 
proportional to its population size, and the coefficient decreases by larger mi- 
gration rate. We set the initial value of this simulation to half of the population 
for each island. 



Mean conv. time 



Migration rate 

1 / generation 

2 / generation 
"Half /generation 
Panmictic 




Fig. 3. Mean convergence time vs population size with respect to several migration 
rates (two islands, population size is in total). 



When the population size is small, we can get the exact solution. The number 
of the states of two islands case is the product of the number of the states of 
each island. It is too difficult to get the transition matrix in general form. We 
evaluated the mean convergence time when the population sizes of one island 
are 2 and 4, i.e. the total population sizes are 4 and 8. As the initial state of 
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population 


migration 


simulation 


exact solution 


2-2 


1 


6.154 


6.167 


4-4 


1 


11.623 


11.670 


4-4 


2 


11.240 


11.297 



Table 2. Mean convergence time for two-island model. Migration represents the num- 
ber of immigrant per generation. 



this simulation, we set the gene frequency of each island to 1/2. The simulation 
results almost coincide with the exact solutions, as shown in table [21 

Though the mean convergence time of island model parallel GAs is propor- 
tional to population size, the proportional constant is large when the migration 
rate is small. From the numerical simulation, the proportional constants are 
1.54, 1.47, and 1.38, depending on the migration for one per generation, two per 
generation, and half of island population per generation, respectively. Since the 
proportional constant of the ordinary ( “panmictic” ) GA is 1.386, the migration 
of half of island population per generation seems to give a similar behavior to 
panmictic GAs. However, the small migration makes the convergence time long, 
it would give a good influence to GAs. 



4 Conclusions 



We considered the mean convergence time subjected to genetic drift and gave 
reference values of mutation and migration. We want to know the performance 
of GAs compared with other methods. Therefore theoretical and experimental 
studies of GAs must be performed. The roles of mutation, crossover, and selection 
must be made clear and controllable. Mutation and crossover are effective to tend 
to increase the diversity in population. Gonvergence by reproduction gives the 
tendency of decreasing the diversity. So we want to know the critical point that 
is to balance between increasing and decreasing the diversity in population, to 
make the searching process of GAs effective. Even though the Wright-Fisher 
model is a very simple model, i.e. 1-locus, 2-alleles, fixed population size, and 
discrete generation, it has some remarkable features. From figure (U when the 
mutation rate has the value M = ^, the density of the stationary states tends 
to be uniform. Furthermore, in case of island model parallel GAs, the migration 
rate m = - makes the density uniform. Additionally, such small migration rate, 
as m = ^, increases the convergence time. This means that the island model 
parallel GAs is more efficient not only to use large population size, but also to 
keep the diversity in population than usual GAs. These reference values might 
be a key point to determine the mutation rates and the migration rates. 
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Abstract. We present a framework for studying the biases that recur- 
rent neural networks bring to language processing tasks. A semantic 
concept represented by a point in Euclidean space is translated into a 
symbol sequence by an encoder network. This sequence is then presented 
to a decoder network which attempts to translate it back to the original 
concept. We show how a pair of recurrent networks acting as encoder and 
decoder can develop their own symbolic language that is serially trans- 
mitted between them either forwards or backwards. The encoder and 
decoder bring different constraints to the task, and these early results 
indicate that the conflicting nature of these constraints may be reflected 
in the language that ultimately emerges, providing clues to the structure 
of human languages. 

1 Introduction 

The study of automata and the languages they can process has a history dating 
back to Turing and beyond. Entwined with this story is the study of natural 
languages and of the human mind. The issue is essentially one of constraints. 
The constraints on an automaton, such as time and space, place bounds on 
the types of tasks it can perform including the types of languages it can process. 
Likewise, it is believed that the constraints of the human mind are reflected in the 
languages we use, so that by examining the features of language we may better 
understand the principles that guide human language and thought processes. 

Perhaps the best known work relating automata and languages, which also 
seems highly relevant to natural languages, is Chomsky’s hierarchy [l]. Chom- 
sky’s hierarchy is a family of language classes that can be recognised by a corre- 
sponding family of automata classes. With different restrictions on the automata, 
different language classes may be processed. This hierarchy was designed with 
symbolic systems in mind, and it has been suggested that dynamical systems. 

We thank Tony Plate and Elizabeth Sklar for helpful discussions. The research was 
supported by an APA to BT, a UQ Postdoctoral Fellowship to AB and an ARC 
grant to JW. 
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Message (bit sequence, m) 





<1,1,0,1> 




1 Encoder Decoder 




r n 





0.8125 0.8125 

(real number, x) (real number, y) 



Fig. 1. Getting the point across. Two recurrent networks are nsed as encoder and 
decoder for a communication channel. The encoder is presented with a point from a 
subset of Euclidean space, x £U C. IR", and outputs a sequence of bits, m £ E ,E = 
{0, 1}. This sequence of bits is then used as input for the decoder, which outputs a value 
y £ IR" after the last bit in the sequence has been processed. If the communication 
is successful, then y should approximate x. The example shown is using a numeric 
encoding. 



including many connectionist models, may bring different biases to language 
processing tasks relative to their symbolic counterparts [S], necessitating a re- 
evaluation of the automata/language relationship. 

As well as processing constraints, connectionist models also have learning 
constraints. That is, models are limited not only in what they can represent, 
but in what they can learn. The distinction between learning and representation 
is important when we consider how human languages have developed. For a 
natural language to be viable, it must not only be representable by its users, but 
also learnable by subsequent generations [^. The learning and representational 
constraints of the human brain dictate the set of languages humans are able to 
understand and learn, and consequently the languages that have emerged. 

Recurrent neural networks (RNNs) have shown significant promise as com- 
putational models of various aspects of the human language processing system. 
Part of their appeal is the ability to incorporate syntax and semantics into a 
single model [^. They have also demonstrated competence in learning a wide 
range of grammatical structures [zj , and often reflect real-world data on natural 
language tasks [^fm] and language change [^. It seems important then, to in- 
vestigate the constraints of recurrent networks and the way that they influence 
the properties and emergence of language. 

This paper is motivated by the observation that communication is essentially 
a shared task between sender and receiver, in which the kind of language favoured 
by the sender may not be convenient for the receiver and vice-versa. That is, the 
constraints of the sender and receiver may be different. The language that ulti- 
mately emerges may arise as a compromise between these competing interests. 

We consider a simple language task in which two RNNs try to communicate 
a semantic “concept” represented by a point in a subset, U C IR" of Euclidean 
space . One network sends a message in the form of a sequence of bits, which the 
other network decodes back into a point in the same Euclidean space (Fig. [i|. 
In this paper we consider the case where U is the unit interval [0, 1]. 
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While the task is superficially simplistic, it has some interesting properties. 
The “concept” is specified in a continuous space with arbitrary precision, whereas 
the “language” is a sequence of symbols from a finite alphabet. Unlike studies 
that have looked at language emergence between symbolic agents over a sym- 
bolic channel, this task requires a transformation from a concept described with 
arbitrary precision in a continuous space to a symbolic language. A trade-off is 
required between the amount of precision in the concepts and the length of the 
symbol sequences in the language. 

It is possible to accomplish the task by using a numeric encoding — inter- 
preting the sequence of bits as its numeric (binary) value. For this numeric code, 
two possibilities are immediately obvious: either the most significant part of the 
message can be sent first, or it can be sent last. For example, 0.8125io = O.IIOI 2 
may be sent most-significant-bit (MSB) first as < 1, 1, 0, 1 > or least-significant- 
bit (LSB) first as < 1,0, 1, 1 >. This paper investigates the effect that encoder 
and decoder constraints have on the way that the concept space and message 
sequence can be related. 

In Sect. 2, encoder and decoder networks are each trained separately using a 
hill-climbing algorithm to perform the task using the numeric code. In Sect. 3, 
the encoder and decoder are co-evolved together and are at liberty to determine 
their own “language”. We conclude with some remarks relating the results of 
the simulations to features of natural language. 

2 Simulations 1 and 2: Encoders and Decoders 

In the first two series of simulations we investigate the ability of the individ- 
ual encoders and decoders to perform their respective mappings. In total, four 
mappings are considered. 

1. Encoding a real value to an MSB first binary sequence. 

2. Encoding a real value to an LSB first binary sequence. 

3. Decoding from an MSB first binary sequence to a real value. 

4. Decoding from an LSB first binary sequence to a real value. 



2.1 Encoders 

The architecture for the encoder is a simple recurrent network (SRN) with ad- 
ditional connections from the output units to the hidden units. Since a real value 
may require representation by an arbitrarily long binary string, we initially 
intended that the encoder would output an end-of-sequence symbol once the 
number had been encoded. Pilot simulations suggested that this encoding was 
difficult to evolve so the length of sequences was artificially limited. 

Given this general architecture, it is relatively straightforward to hand-code 
a network with a single hidden unit to perform the encoding for an MSB first 
sequence. Such a network is shown in Fig. [21(a). However, it is not possible to 
perform the LSB first encoding without a large number of hidden units due 
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Fig. 2. (a) MSB encoder: A RNN that takes a real number between 0 and 1 and 
encodes it as a numeric string, most significant bit first. The hidden unit uses a linear 
threshold activation function that saturates at -1 and 1, whereas the output units use 
binary (0.5) threshold units. The input value is presented at the first time-step only, 
(b) LSB decoder: A SRN that decodes numeric sequences LSB first. The input is 
wrapped with start and end markers. After presentation of the end marker, the output 
unit activation corresponds to the appropriate value. Linear (0,1) threshold activations 
are used on all units. 



to the fractal nature of such an encoding. (For messages of n bits, 2" values 
• ■ • , ^ 2 " encoded. For any value, the first bit of output is 

the opposite to that of its neighbours.) 

Although a solution could be hand-coded, it was unknown whether it was 
learnable, so a series of simulations was designed to address this question. Net- 
works were evolved using a simple hill-climbing algorithm to perform both the 
LSB and MSB mappings. A “champion” decoder was created with initially ran- 
dom weights. A single mutant was then spawned by randomly perturbing the 
weights of the champion according to a Gaussian distribution with 0 mean and 
initially 0.1 variance. If the mutant was able to encode values as well as, or bet- 
ter than the champion, then the mutant became champion and a new mutant 
was spawned. To evaluate the accuracy with which values were encoded, the 
strings were decoded with a perfect numeric decoder, and the sum squared error 
between encoder input and decoder output was calculated. 

The values chosen to be encoded were selected by taking a staged learning 
approach |3]. Initially, only two values, 0 and 0.5, were encoded, and the number 
of bits that could be sent was accordingly set to 1. Once a network was able to 
perform this mapping, 2 bits could be sent, encoding 0, 0.25, 0.5 and 0.75. In 
general, after 2^ numbers could be successfully encoded into k bits, the networks 
were given 2^+^ values to encode into {k + 1)— bit sequences. The variance was 
modulated throughout the course of the simulations. Simulations were run for 
a maximum of lOOK generations, or until all 5-bit values could be encoded. 
Networks with 1, 2, 3 and 5 hidden units were evolved. 

2.2 Decoders 

SRNs were used as decoders. The task for the these networks was the inverse 
of the encoders’ task with minor variations. Each string presented to a decoder 
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MSB First 


LSB First 


Hidden units: 


1 


2 


3 


5 


1 


2 


3 


5 


Encoders 


11 


18 


11 


7 


0 


0 


0 


0 


Decoders 


0 


0 


0 


0 


22 


26 


30 


22 



Table 1. Number of networks (of 50 trialed) attaining 5-bit precision in each condition. 



was enclosed with start-of-sequence and end-of-sequence inputs, a legacy of the 
task originally considered for the encoder. The additional inputs did not appear 
to have a considerable impact on the simulations. 

Unlike the encoder, the decoder is capable of decoding either MSB or LSB 
first, though with some significant differences. Figure [2](b) shows a SRN that 
decodes LSB first. Although an LSB decoder is able to decode strings of varying 
lengths with only a single hidden unit, an MSB decoder (not shown) can only 
decode strings of a fixed length with a single hidden unit. Simulations were 
carried out in a similar manner to the encoder. A perfect encoder was used to 
encode values to numeric binary sequences. Decoders were compared by the sum 
squared error across all presented strings. The same principle of staged learning 
was applied. 

2.3 Results: Encoders and Decoders 

Networks of all sizes were able to encode MSB first sequences and decode LSB 
first sequences of up to 5-bits. No networks could encode more than 2-bit values, 
LSB first. No networks were able to decode 5-bit sequences MSB first, although 
one network with 5 hidden units was able to decode d-bit sequences. The results 
are broadly summarised in Table |l] 

3 Evolving a Language 

There is clearly a significant difference between the encoders and decoders. The 
encoders were only able to learn the MSB first encoding, whereas the decoders 
preferred learning LSB first sequences. This presents a serious dilemma when we 
consider the complete system of encoding and decoding (Fig. H}. If the system 
is to successfully communicate values, then the encoder and decoder must com- 
promise on the nature of the code. An MSB or LSB code will not suffice for the 
combined system. 

Simulations of the complete system were performed under two conditions. In 
the first, the communication channel reversed the message: whatever was sent 
first by the encoder was received last by the decoder. This condition allows an 
MSB code with the encoder encoding MSB first and the decoder decoding LSB 
first. With the second condition, the order of the message on the communication 
channel was preserved. In this scenario an MSB code is more difficult, and the 
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1. Create a champion encoder and decoder. 

2. Create a mutant encoder by perturbing the weights of the champion. 

3. If the encoding created by the mntant uses a greater variety of strings than the 
champion, select that mutant. 

4. Create a mutant decoder with weights initialised between -1.0 and 1.0. 

5. For n iterations, present all inputs of the current precision to the encoder. Train 
the decoder on the output of the encoder. 

6. If the final sum squared error of the mutant encoder and decoder across all 
strings is lower than that of the champions, make the mutants the champi- 
ons. Furthermore, if the mutants got all strings correct, increase the precision. 
Return to step (2). 



Fig. 3. Evolutionary algorithm for combined encoder/decoder system. 



encoder and decoder must develop a code which can be effectively learned and 
processed by both networks. 

Pilot simulations showed that using a hill-climber for both encoder and de- 
coder was intractable. Tests of backpropagation through time (BPTT) on the 
decoder showed that it was qualitatively similar with respect to the learning 
task of Sect. 2.2, but faster. Hence, a hill-climber was used for the encoder and 
BPTT for the decoder. The basic algorithm for the co-evolution of the system 
is described in Fig. El 

3.1 Forwards and Reversed 

Both the encoder and decoder used two hidden units, with the decoder trained 
for 1000 epochs. The system was give n -I- 2 bits when communicating n— bit 
values in the reversed case, and 2n bits in the forwards case. The extra bits were 
found to be necessary for a successful code to develop and have the effect of 
increasing the proportion of codes that uniquely identify each value. 

In the reversed condition, the system was able to create successful codes for 
5-bit values. A typical code is shown in Table El The code is effectively a sparse 
numeric code. Although not all binary sequences are used, those that are used 
are ordered by their numeric values. 

The simulations performed with the forwards channel were not nearly as 
successful as the reversed case. The best observed code, shown in tableEl encoded 
all 3-bit values. It is apparent that it is neither strictly MSB nor LSB first, since 
there is no clear ordering of the significance of each bit (less significant bits 
should tend to show greater sensitivity to changes in the input.) 

4 Discussion and Conclusions 

The first series of simulations demonstrated the different constraints of the en- 
coder and decoder on the numeric encoding task. Whereas the encoder is only 
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Reversed | 


1 Forwards | 


Input 


Message 


Flipped 


Output 


Message 


Flipped 


Output 


0.0000 


loom 


001101 


0.0029 


010111 


000010 


0.0000 


0.0625 


100100 


001110 


0.0420 








0.1250 


111001 


010011 


0.1211 


010101 


000000 


0.1194 


0.1875 


mill 


010101 


0.1943 








0.2500 


110010 


011000 


0.2738 


011101 


001000 


0.2425 


0.3125 


110011 


011001 


0.3136 








0.3750 


110000 


011010 


0.3608 


111101 


101000 


0.3538 


0.4375 


001001 


100011 


0.4424 








0.5000 


001111 


100101 


0.4945 


mill 


101010 


0.4977 


0.5625 


001100 


100110 


0.5412 








0.6250 


000011 


101001 


0.6105 


111110 


101011 


0.6162 


0.6875 


000000 


101010 


0.6577 








0.7500 


000001 


101011 


0.7272 


101110 


moil 


0.7414 


0.8125 


000111 


101101 


0.8002 








0.8750 


011000 


110010 


0.8488 


101010 


111111 


0.8761 


0.9375 


011001 


110011 


0.9184 









Table 2. Left: Language for the reversed system, 4 bit precision. The code employed 
is not immediately apparent. Flipping alternate bits of the message (bits 1, 3 and 5) 
in the third column shows that the messages are, in fact, in numeric order. The bit- 
flipping behaviour is a consequence of having negative recurrent weights that oscillate 
the significance of successive bits. Right: The code from a forwards system for 3 bit 
precision. Flipping the bits of the message results in a code which is almost in numeric 
order both left-to-right and right-to-left, 0.0 proving the exception in both cases. 



able to encode values MSB first, the decoder has a preference for decoding val- 
ues LSB first. The second series of simulations are pilots and show how the 
different constraints of the networks may affect the evolved code. In both cases, 
co-evolution of the encoder and decoder was difficult. The primary cause of this 
appeared to be the lack of quality information given to direct the encoder’s search 
through a combinatorically large space (functions from values to strings). En- 
couraging variability in the encoder proved a useful heuristic, since a necessary 
condition for a successful code is that every value has a unique encoding. 

In the reversed condition, the biases of the networks were consistent and 
produced a numeric code. The system produced the type of encoding expected, 
given the results of the earlier component simulations. The codes developed 
in this condition were more sparse than strict numeric codes. Attempting to 
force a compact encoding on the encoder failed due to the small proportion of 
appropriate codes within the large search space. 

When the message sent by the encoder was not reversed (the forwards case) 
the networks compromised on the code since neither was able to learn the en- 
coding preferred by the other. Although the simulations did not develop codings 
to cope with significant levels of precision, they did give indications that the 
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system employed neither MSB nor LSB codes, but instead those that could be 
read either backwards or forwards. 

This is preliminary work and further simulations will be needed to substan- 
tiate the combined encoder/decoder study. We have presented a framework for 
studying the effects of constraints on the processing and emergence of language. 
The simulations presented here have been abstracted away from real languages, 
so an important goal of future work is to tie the framework more closely to nat- 
ural language. A number of extensions for this purpose are immediately obvious 
including the use of multi-dimensional inputs, more symbols in the language, a 
non-uniform distribution of inputs and a population of communicators. 

However, comparing the results of these initial simulations with human lan- 
guages shows some interesting parallels. In the unrealistic reversed case, a code 
develops which resembles a numeric code. In the forwards case, the networks cre- 
ate a code that can be read either forwards or backwards, which is less efficient 
but meets the constraints of both the encoder and decoder. This is reminiscent 
of the tendency in human languages towards palindrome-like structure (e.g. N1 
N2 N3 V3 V2 VI) which can be parsed in either direction. In further studies we 
hope to explore how certain features of human languages might have arisen as 
a compromise between the conflicting constraints of sender and receiver. 
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Abstract. The elite genetic algorithm with adaptive mutations is ap- 
plied to two different continuous optimization problems: determination 
of model parameters of optical constants of aluminum and thin film opti- 
cal filter design. The concept of adaptive mutations makes the employed 
algorithm a versatile tool for solving continuous optimization problems. 
The algorithm has been successful in solving both investigated problems. 
In determination of optical constants of aluminum, excellent agreement 
between calculated and experimental data is obtained. In application to 
thin film optical filter design, low-pass filters designed using this algo- 
rithm are clearly superior to filters designed using the traditional ap- 
proach. 



1 Introduction 



Genetic algorithms (GAs) [I] are stochastic global search methods that mimic 
the concept of natural evolution. Due to the nature of the algorithm, their suc- 
cessful application is mostly restricted to optimization problems whose solution 
can be conveniently represented in binary form. However, there is a rising in- 
terest in applying genetic algorithms to continuous optimization problems. For 
that reason, various modifications of original GAs have been reported mmm. 
The fact that real-coded GAs are superior to the binary coded ones in contin- 
uous optimization has already been recognized |4p,5|7j8l9ll fIJ . By representing 
variables with the real numbers, length of the chromosome is equal to the num- 
ber of variables and it is significantly smaller than in the case of binary coding. 
Also, conversion of binary numbers into floating-point numbers and vice versa 
is avoided. The most important advantages of the real-coded GAs are the ab- 
sence of the Hamming cliff problem, which is inherent to all binary coded GAs 
mm, and the fact that variables cannot be altered in an undesired manner 
or destroyed in the crossover operation. However, the real coded GAs can in 
certain cases be blocked from further progress [S]. There has been much work 
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in modifying the real-coded GAs in order to make them as successful in solv- 
ing continuous optimization problems as they binary counterparts are in solving 
discrete optimization problems I2I3I9I10I . 

Main shortcoming in continuous optimization applications of the GAs ap- 
pears to be the discrete sampling of the solution space, which results in the fact 
that global minimum can be located only roughly. For obtaining the location of 
global minimum more precisely a huge number of the chromosomes in the popula- 
tion is required. Several methods have been proposed to overcome this difficulty. 
Obviously, if new values could be introduced during the optimization procedure, 
that would reduce the necessary number of chromosomes in the population for 
finding satisfactory solution of the problem. Since mutation is usually consid- 
ered to be of less importance than selection and crossover operators I1I8I . the 
work in development of real-coded GAs suitable for continuous optimization was 
concentrated on devising crossover operators suitable for real numbers |2i3|9(10|. 
In elite genetic algorithm with adaptive mutations (EGAAM) [H] new values 
are introduced by mutation operator, while for selection and crossover conven- 
tional operators are employed. Adaptive mutations are performed by completely 
replacing specified percent of entire individuals by new ones, whose variable val- 
ues are generated in boundaries which are being adaptively narrowed during the 
optimization. In such a manner, improvement in the precision of locating the 
minimum is achieved. This algorithm is applied here to two different problems: 
model parameter determination and thin film optical filter design. 



2 Description of the algorithm 

We shall give brief description of the employed algorithm. The algorithm uses the 
floating point representation [USE], which was proved to be more convenient for 
continuous optimization problems. In floating-point chromosome representation, 
each gene has the value of the corresponding variable p{k),k = 1, Uv, w here riv 
is the number of variables. Values p{k) in chromosomes of the initial population 
are given by p(k) = p\{k) -\- {pu{k) — pi{k)) ■ r, where r is a random number 
r G [0,1], and p\{k) and Pu{k) are initially set boundaries. In such a manner, 
confinement of variables in the specified domain is achieved insuring that all 
variables have physical interpretation. Due to the nature of the problem, in 
design of low-pass optical filter slightly different chromosome representation is 
used, as follows. Each layer is characterized with two real numbers - material 
code and layer thickness. Refractive index of the layer is m-th element of the 
sequence of refractive indexes of available materials, where m is rounded value 
of material code of the layer. 

EGAAM employes the elitist selection mechanism [12113114] . In elitist selec- 
tion, Pg percent of the new generation is produced by selection, and Pc percent 
is produced by crossover. Ns = N * Ps strings with the best fitness, where 
N is the number of strings in the population which enter directly the next 
generation. The = N * strings in the new population are generated by 
crossover among the parent strings which were chosen fitness proportionally 
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between all the strings in the current population. Crossover is performed by 
generating a random integer S where Uv is a number of variables, 

i.e. number of elements in strings, and n min is the minimal number of elements 
exchanged in the crossover. Then we generate random integers € [l,n,par]j 
i = l,iVi and swap elements at positions n^. Adaptive mutations are imple- 
mented as follows. In the current generation, average value /i(k) of parameter 
p(k) is computed, and Pm percent of the chromosomes in the next generation 
are formed by generating their genes in the same manner as during the cre- 
ation of initial population, but in the narrowed boundaries. New boundaries for 
each parameter are given by Pnew-u(^) = Poid-u(fc) ~ c • (poid-u(^) ~ p{k)) and 
Pnew-i{k) = poid-i(^) + c • (/t(fc) — Poid-i(fc)), where /i(fc) is the average value of 
the parameter p{k) in the current population, and c is a predetermined positive 
number 0 < c < 1. In such a manner, a specified Pm percent of every generation 
are entirely new chromosomes. The EG A AM was proved to be superior over the 
conventional GA on three families of multiminima test functions for 20, 50 and 
100 variables 

3 Application to modeling the optical constants of 
aluminum 

In this section, the applied Lorentz-Drude (LD) model for the optical dielectric 
function, which was often employed for modeling the optical constants of metals 
jlsflel is briefly discussed. It was shown [rfni that dielectric constant e(a;) can 
be expressed in the following form 

e(w) = 1 - ^ ^ (^2 _ ^2) ^ > (1) 

where LOp is the plasma frequency, k is the number of interband transitions with 
frequency ojj, oscillator strength fj and lifetime 1/Pj, while f2p = is 

associated with intraband transitions with oscillator strength /g and damping 
constant Pg. The model parameters are determined by minimizing the discrep- 
ancy between calculated and experimental dielectric function values, employing 
the objective function proposed in Refs. ro- To investigate how many signifi- 
cant digits the proposed algorithm obtains for the parameters of the LD model, 
we have generated values of the dielectric function etest(i^) in the range from 6.3 
meV to 15 eV, using the target parameter values given in Table El To emulate 
more realistically a real set of experimental data, we have generated another 
set of data with the same target parameter values, but with Monte-Garlo gen- 
erated Gaussian noise which accounts for experimental uncertainties of 0.5% in 
the reflectance data calculated from generated dielectric function values. Param- 
eters obtained by EGAAM and conventional GA for both data sets are given in 
Table I. Mutation in conventional GA is performed by changing the value of pa- 
rameter p{k) to Pmut{k) = p{k) + sgn * Ap{k), where sgn is a random number in 
interval [-1,1], while Ap{k) is the step value for parameter k. It can be observed 
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Table 1. Target and obtained parameter values, superscript denotes results on data 
set without noise, while superscript ^ denotes results for the data set with noise. 



parameter Target EGAAM® 


GA® 


EGAAM'® 


GA*® 


F 


— 


0.021 


1.090 


0.782 


1.370 


/o 


0.700 


0.702 


0.712 


0.702 


0.703 


To 


0.060 


0.061 


0.059 


0.061 


0.061 


h 


0.200 


0.194 


0.188 


0.190 


0.188 


Fi 


0.300 


0.294 


0.280 


0.291 


0.280 


UJl 


0.400 


0.401 


0.397 


0.401 


0.402 


/2 


0.300 


0.308 


0.387 


0.313 


0.378 


F2 


0.300 


0.305 


0.354 


0.309 


0.354 


0^2 


1.500 


1.502 


1.519 


1.503 


1.519 


h 


0.200 


0.191 


0.125 


0.184 


0.114 


Fs 


1.000 


1.009 


1.252 


0.998 


0.786 


U>3 


2.000 


2.026 


2.297 


2.036 


2.196 


u 


0.050 


0.050 


0.046 


0.051 


0.064 


Fi 


3.000 


3.053 


2.956 


2.980 


3.105 




4.500 


4.519 


4.538 


4.498 


4.341 



that EGAAM is clearly superior to GA in terms of how close are the obtained 
values to the target ones. 

There has been considerable interest in the optical properties of aluminum 
HSUS!. We have chosen to model the optical properties of aluminum as another 
test of our technique, since it is well known material, ensuring that results can 
be anticipated in advance. Fig. [I] shows real and imaginary parts of the dielec- 
tric function of aluminum as a function of energy. Excellent agreement between 
calculated and experimental values, with relative rms error of about 6% for ei 
and relative rms error of 3% for C 2 , can be observed. 




Fig. 1. Real and imaginary parts of the dielectric constant of Al vs. energy (solid line 
- model, open circles - experimental data) 
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4 Application to thin film filter design 

Many electromagnetic applications require devices that exhibit specific frequency 
dependent properties. One of such structures is an optical filter consisting of 
dielectric layers. The structure is bounded by air on one side, and by a sub- 
strate medium with known refractive index (usually glass) on the other side. 
The structure is characterized by reflectance R (fraction of incident energy that 
is reflected from the filter). Assuming that the filter is lossless, which is valid 
for dielectric layers, transmittance T (fraction of incident energy transmitted 
through the filter) equals 1 — i? in the case of normal incidence. Design of the 
optical filter represents choice of the optimal materials of layers and their thick- 
nesses, or just optimal thicknesses if the dielectric properties of two alternating 
materials are given, in order to obtain the desired frequency dependence of R. 
GA based filter design algorithms have several advantages compared to classical 
design procedures Firstly, GA do not require a crude preliminary design to 
ensure convergence, since it is not easily trapped in a local optimum, contrary to 
classical iterative techniques. Secondly, design procedure is independent on the 
nature of multilayer, as well as the characteristics of incident and substrate me- 
dia. Finally, the design objective can be changed easily by manipulating the cost 
function. There are several studies employing GAs in thin film filter design. In 
m real-coded GA was used for optimizing the thicknesses of alternating layers 
of two given materials, while the objective function measured how closely ob- 
tained reflectance characteristics approaches the desired one over the prescribed 
frequency band. In similar objective function and real-coded GA were used, 
while not only thicknesses but also refractive indexes of layers were optimized. 
However, main shortcoming of this method is that material properties of each 
layer have continual values within given range, thus giving no guarantee that 
material with such properties exists. In filter consisting both of dielectric 
and metal layers was designed. Objective function measured heat trapping effi- 
ciency of the device, while the coding method was binary. Binary coding enabled 
selection of the material for each layer from the database of available materials. 

We shall describe briefly the theoretical background of transport of electro- 
magnetic wave through a series of dielectric layers with the index of refraction ni 
and thickness di, placed between the two transparent media with the refractive 
indexes no and Ug. In the case of one dielectric layer, relation between electric 
and magnetic fields of the incident (Ei,Hi) and transmitted {Eu, Hu) waves is 

m 



'El 


= M ■ 


'Eu 


■ M = 


cos(koh) i^sin{koh) 










iYi sm{koh) cos(koh) 



Here, ko = 2 tt j\ is the wavenumber of the incident electromagnetic wave, h = 
nidcos^i, 01 is the angle of incidence, and Fj = ^J~^n\cos9\. In the case of N 

layers of dielectrics between two transparent media, we can assign to each layer 
the matrix in the form of Eq. 0. The connection between electric and magnetic 
fields before {E\, H\) and after {Eu, Hu) the multilayer structure is described by 
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X [nm] X [nm] 



Fig. 2. Reflectance of the low-pass Alter with cut-off wavelength (a) Aq = 750nm and 
(b) Ao = 600nm. GA optimized design shows significantly reduced ripple in both the 
pass band and the stop band 



El 

Hi 




En+1 
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Coefficient of transmittivity and coefficient of refiectivity are given by 



( 3 ) 



^ _ Yoiriii + YoYgmi2 - m2i - Ygm,22 . ^ _ 21q 

Vo”^ii + YoYgmi 2 + m 2 i -I- Ygin 22 ’ ^0^11 + ^0^9^12 + ^21 + Ygm 22 

^ ^ ( 4 ) 

where uq and Ug are refractive indexes of the two transparent media, while Yq 
and Yg are the corresponding admitanses defined in the same manner as the Yi 
above. 

Ratios of the intensities of the transmitted and incident wave and the re- 
fiected and incident wave, are transmittance T = tt (ug cos 9g ) / (no cos 9q) and 
refiectance R = rr . The multilayer in conventional thin-film optical filters usu- 
ally consists of alternating layers with high and low refractive index, whose 
optical thicknesses h = nidcosOi equal to one quarter of the chosen wavelength 
Ao, and there are also some filters with layers where h = Xq/8. Structure of the 
filter is often denoted with the following: a - air, g - glass, H - layer with high 
refractive index and L - layer with low refractive index. Conventional low-pass 
filters have the structure g-(0.5H)L(HL)^(0.5H)-a, where 0.5H denotes the layer 
with high refractive index whose h = Ao/8. 

Region of interest in the low-pass filter is the edge of the pass band, rather 
then the peak reflection of the stop band. It is generally desired that (a) the 
transition edge be as sharp as possible and (b) the transmission zone has re- 
flection as close to 0% as possible. Traditionally degree of steepness of the edge 
is improved by increasing the number of the layers. This, in turn, significantly 
increases the ripple in the pass band. Departing from the traditional h = Aq/ 4 
layer thicknesses seems to be the only way to achieve both requirements. In 
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Fig. 3. Reflectance of the low-pass filter with cut-off wavelength Ao = 350nm. Pass 
band ripple is improved, especially in the ling-wavelength region. The peak reflectance 
in the stop band is less important feature in these filters. 

EGAAM designed filter, the following objective function was used to achieve 
the desired reflectance dependence on wavelength 



where Up is the number of points in which we calculate reflectance, Aq cut-off 
wavelength and Ro and i?d the obtained and the desired reflectance, respectively. 
The squared difference between the obtained and desired relectance is multiplied 
with the Gaussian function to enhance the sharp edge at the cut-off wavelength. 
In spite of this, EGAAM design still has improved pass band ripple, as compared 
to the traditional design. 

We have compared the frequency characteristics of the reflectance for the 
conventional and GA optimized thin film filter design with 15 dielectric layers. 
The results obtained for the cut-off wavelengths equal 750 nm and 350 nm are 
shown on Fig. El and Fig. El respectively. Gonventional low-pass filter consists 
of alternating layers of cryolite and As 2 Se 3 , which was the most favorable choice 
for traditional design. GA optimized filter used layers chosen from the list of 16 
available materials. It has been found that GA optimized filters tend to preserve 
the tendency of alternating layers with high and low values of the refractive 
index, but greater choice of available materials enables finer tuning of the filter 
characteristics. It can be observed that for all three cut-off wavelengths GA 
optimized low-pass filters have satisfactory performance in wider spectral range. 
On the other hand, traditionally designed low-pass filter with Aq= 750 nm, has 
significant ripple in the stop band (especially in the visible range) which can be 
very important from the application point of view. Its reflectivity drops to very 
low values below 400 nm, while GA optimized filter retains high reflectivity in 
the entire visible range. 




( 5 ) 
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5 Conclusion 

Elite genetic algorithm with adaptive mutations (EGAAM) is applied to mod- 
eling the optical constants of aluminum and to low-pass thin film optical filter 
design. In application to determination of model parameters of optical constants 
it is shown that EGAAM is capable of obtaining more significant digits in model 
parameter values than its conventional counterpart. Excellent agreement be- 
tween calculated and experimental data for the dielectric function of aluminum 
is obtained. In application to low-pass filter design, reflectance of EGAAM op- 
timized Alter is superor over the conventionally designed one in terms of wider 
spectral range in which the desired characteristics of the Alter is achieved. 
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Abstract. Our work introduces an evolutionary approach applied to the 
design of digital circuits. Particularly, we address the case of synthesising 
a controller for a simple CPU, a case study which has not been tackled 
by other authors so far. In order to cope with this problem, a novel cir- 
cuit evaluation strategy has been employed; and new evolvable hardware 
systems paradigms derive from this technique. We show that the use of 
this new evaluation approach allows the achievement of smaller circuits 
and promises to be effective when the problem scales up. Furthermore, 
our methodology yields novel digital circuits comparing to conventional 
design. 

Keywords: Evolutionary Hardware, Sequential Circuits, CPU control. 



1 Introduction 

This work applies artificial evolution as a tool for automatic synthesis of digital 
circuits. Digital design encompasses two major areas, combinational and sequen- 
tial circuits [1]. Although the majority of the evolutionary systems applications 
in circuit design have focused on the area of combinational circuits , the area 
of sequential systems promises to be more adequate for evolutionary computa- 
tion. This stems from the fact that sequential circuits design allows the use of 
feedback connections, making it more complex for conventional techniques [^. 

Our preliminary study on the use of evolutionary computation in sequential 
circuits design indicated the inability of simple genetic algorithms to handle more 
complex tasks in the area, a fact observed in combinational design as well. We 
propose in this work a new kind of evaluation strategy, in which internal points 
of the evolving digital circuits are assessed together with the circuit output. The 
authors have selected, as case study, the evolution a CPU controller, since this 
illustrates a practical application for evolutionary systems. 

This article is composed of five additional sections: section 2 briefly reviews 
the area of sequential systems design and conventional tools used for that pur- 
pose. Section 3 presents the target problem, i.e., the particular architecture of 
the CPU for which the control circuit will be designed. Section 4 describes our 
evolutionary approach and section 5 presents the evolved circuit. Finally, section 
6 analyses our results. 
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2 Sequential System Design 

Figure [T] illustrates the basic topology of a sequential circuit. It can be seen that 
a combinational circuit (formed by basic boolean gates) and storage elements are 
interconnected to form this kind of topology [Ij. The sequential circuit receives 
binary information from its environment via the inputs. These inputs, together 
with the present state of the storage elements, determine the binary value of the 
outputs. A sequential circuit is, therefore, specified by a time sequence of inputs, 
internal states and outputs |4]. 

SIS is a state of art tool for synthesis and optimisation of sequential circuits 
|5]. One of the main features of this tool is the exploration of signal dependencies 
across the memory elements boundaries, instead of optimising logic only within 
the combinational blocks. However, the design specification must be supplied as 
a netlist of gates or a finite state machine transition table, which requires a prior 
knowledge of the system from the user. 




Oiilputs 



Fig. 1. Block Diagram of a Sequential Circuit (extracted from [^) 



3 Target Problem - Random Control Logic Unit 

The task of controlling the operations of a microprocessor is a typical example 
of a sequential circuit task. The control unit enables the CPU to carry out the 
instruction currently in the instruction register. This is accomplished through the 
interpretation of the pattern of bits in the instruction register[J, which generates 
a sequence of actions taking place during the execution of an instruction [Tj; the 
control unit is the circuit that provides this operation. Particularly, the random 
or hardwired logic control unit is made up of an arrangement of boolean gates 
and flip-flops [Ij . 

In [Tj , a simple model of CPU is presented and a random logic control unit is 
designed to allow the execution of eight different instructions. Using this CPU 
model we propose the task of evolving the control unit, instead of designing it. 
Figure (21 shows the structure of this primitive CPU; table 1 shows the interpre- 
tation of machine-code instructions (note that the fetch cycle occurs for all the 
eight instructions). There are a total of 16 control signals, which are clustered 
in 5 groups: Enable signals (E); Clock signals (C); ALU signals; Main Store 
(MS) signals (Read and Write); and Flip-Flop (FF) signals (Reset and Set). The 
evolutionary system must generate the signals Cmar, Embr, Em, etc, given a 
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particular instruction as input. For further details on the CPU operation, refer 

to P. 
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Table 1 - Interpretation of Machine Code Instructions (Reproduced from 




Fig. 2. Block Diagram of a Simple CPU (Reproduced from p]) 



4 Problem Modelling 

This section describes both the representation and evaluation used within our 
evolutionary system. 

The gate level representation |2] has been used to encode each circuit into 
an integer string. Figure El illustrates an example of this kind of representation 
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for a hypothetical output signal. The circuit (phenotype) is constituted by a 
combinational part (arrangement of boolean gates) and a sequential part (D 
flip-flop E). The latter provides the means whereby a delayed version of the 
output signals can be used as feedback for the same or other circuits. 

The genotype is made up of blocks of integer numbers or genes that encode 
the type of each particular logic gate shown in the figure. The genes associated 
with the gates of the first layer will encode its nature and also the source of 
the input signals. The cell input signals are chosen among the following signals 
(Figure [T|): 

1 . Clock signals, supplied by a master-clock and a counter; 

2. the three bits of the instruction register that determines the instruction to 
be executed; 

3. and the own output control signals delayed by one clock period. 

As there are a total of 16 output signals, the overall system will be made up 
of 16 cells like the one in the figure. 




Gene selects gate’s nature (AND, NAND, OR, NOR and XOR) 
and inputs’ sources 



Fig. 3. Gate level representation of a sequential circuit 

The fitness evaluation function was designed to simply count the number 
of hits in each cell output, comparing to the target output signals. However, 
this approach proved to be unsuccessful for some output signals. The authors 
devised a way to overcome the problem by providing additional signals to the 
fitness evaluation function. These new signals are taken from internal circuit 
points. Figure [^illustrates this procedure: circuits (A) and (B) have two internal 
points and the external output probed. In order to implement this new evaluation 
strategy, it is necessary to set target functions for the internal points. This 
has been accomplished through the so-called OR and NOR evolvahle hardware 
paradigms, in which the output gates are fixed to either OR (Circuit A) or 
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Fig. 4. Inputs available for the evolutionary system 



NOR (Circuit B) functions respectively. OR and NOR gates have been chosen 
because they simplify the internal points assessment comparing to other gates. 
When using the OR paradigm, the internal points’ fitness are computed by 
taking as target function the own output function to be realised by the circuit, 
because the OR gate performs a simple boolean sum. Conversely, when using 
the NOR paradigm, the internal points’ fitness are calculated by taking as target 
function the complement of the output function, since the NOR gate performs 
a complemented boolean sum. The circuit shown in Figure |3] illustrates this 
strategy as well. Its output gate has been fixed to a NOR boolean function (2 
ORs followed by 1 NOR = 1 NOR); five points are then evaluated, the final 
output and four internal points. 




Gate 
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Fig. 5. New evaluation strategy 
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It has been verified that, for some control signals, the OR paradigm brought 
a significant improvement in performance, whereas the NOR paradigm was more 
effective for other signals. This improvement stems from the fact that the circuit 
behaviour is now constrained in its internal points, which makes the search 
process focus on a smaller set of solutions. 

To further improve the GA performance, penalties have been applied to dele- 
terious sub-circuits. For instance, when the output gate is fixed as an OR, the 
individual fitness is penalized when internal circuit’s points produce a ’1’ output 
when the target is ’O’. This is due to the fact that a ’1’ will clamp the circuit out- 
put to an erroneous value, regardless of the values of the other internal points. 
A similar method is applied in the NOR paradigm. 

5 Results 

In order to evolve the whole control system, 16 genetic algorithms have to be 
executed, one for each circuit output. The authors adopted the following strat- 
egy: 

1. Run the 16 GAs for each signal, assuming a delay flip-flop in each circuit 
output; 

2. Find the output signal(s) which was(were) hardest to evolve, and store the 
delayed output signals used as inputs to this(these) cell(s); 

3. Re-run the GA for the other signals, keeping available only the delayed 
signals used by the circuit (s) representing the signals mentioned in the second 
item; 

The aim of this strategy is to minimise the amount of hardware, by placing 
a delay flip-flop only in those signals used as inputs to the cells which have 
been more difficult to evolve. In our particular case, the evolution of the RESET 
signal was the most time consuming . The output function of this control signal 
is depicted in the column R (FF) of Table 1. The OR paradigm has been used 
and, due to the task complexity, we allowed the GA to use all the available input 
signals (see Figure 0. Figure [B| shows the evolved circuit as well as its input 
signals. It has also been verified that the cell could be simplified by taking away 
a sub-circuit which was not effectively contributing to the final behaviour. The 
possibility of cutting hardware from the final solution is another advantage of 
the OR paradigm. 

In order to evolve the other signals, we allowed the GA to use only the delayed 
output signals used as inputs to the RESET circuit. After simplifying the circuit 
in Figure El it can be verified that only 6 out of 16 control signals will need 
a delay fiip-fiop: MBR-Enable, the own Reset signal, PG-Glock, MBR-Glock, 
DO-Enable and SET (signals with time index T-1). 

The graph of Figure [7] compares the evolution of the particular signal ALU- 
ENABLE when the OR and NOR paradigms are used. The average value of the 
best genotypes over five executions, along 300 generations for 40 individuals is 
shown in this graph. It took around 4 minutes to run the executions in a SPARG 
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Fig. 6. Circuit evolved to generate the RESET control signal 

4 workstation. One of the evolved circuits for the ALU-ENABLE signal is shown 
in Figure [Hj It can be verified that this circuit could be utterly simplified, and 
there is no need for an output flip-flop. Due to space limitations, we can not 
show the other control circuits evolved. The final solutions have been checked 
using the PSPICE simulator. 




Fig. 7. Average Fitness of the Best Genotypes for the ALU-ENABLE signal using OR 
and NOR Paradigms 



6 Analysis of the Results 

We can compare the evolved CPU controller with a human designed one shown in 
P[j. The evolved circuit uses six additional flip-flops, meaning that evolutionary 
systems does not use the minimum amount of states in the synthesis of the 
sequential system. In terms of boolean gates, the evolved controller uses around 
150 gates, against 90 of the human designed one. Nevertheless, the authors are 
confident that the amount of hardware can be reduced in further experiments. 

Our proposed design approach has the advantage of using minimum designer 
knowledge of the target system and of achieving novel digital circuits. The former 
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Fig. 8. Circuit evolved to generate the ALU-ENABLE control signal 

property reveals an advantage over conventional CAD tools like SIS. The latter 
property refers to the fact that the evolved circuits depart from the constrained 
spatial structures observed in conventional circuit design p]. Two main benefits 
arise from this feature: the achievement of new design methodologies and the 
potential of evolutionary tools to handle more complex designs. 

The authors have also presented a new evolvable hardware technique, in 
which internal circuit points are evaluated. This strategy can be generally applied 
to both combinational and sequential circuits’ evolution. 
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Abstract. This paper introduces and describes a number of novel models of 
evolutionary designing beyond that of genetic algorithms and genetic program- 
ming treating designing as search. The focus in many of these novel approaches 
when applied to designing is to add to the range of possible designs which might 
be able to be produced during an evolutionary process. Four approaches are 
briefly described: genetic engineering; reverse engineering of emergent features 
in the phenotypes; developmental biology and generalizing crossover. 



1 Introduction 

The basic genetic analogy in designing utilises a simple model of the Darwinian the- 
ory of improvement of the organism’s performance through the “survival of the fit- 
test”. This occurs through the improvement of the genotype which goes to make up the 
organism. This is the basis of most evolutionary systems. Fundamental to this analogy 
are a number of important operational aspects of the model: 

• the design description (structure) maps on to the phenotype 

• separation of the representation at the genotype level from that of the design 
description level 

• the processes of designing map on to the evolutionary processes of crossover 
and mutation at the genotype level 

• performances (behaviours) of designs map on to fitnesses 

• operations are carried out with populations of individuals. 

In designing terms this maps directly onto the method of designing as search. We 
can describe this notion using the state-space representation of computation: 

• state space is fixed at the outset 

• state space comprises behaviour (fitness) and structure (phenotype) spaces 

• genetic operators move between states in structure space, performance 
evaluated in behaviour space. 
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Designing as search is a foundational designing method but one that is restricted in 
its application to routine or parametric designing. In such designing all the possible 
variable which could occur in the final design are known beforehand as are all the 
behaviours which will be used to evaluate designs. Since the goal is to improve the 
behaviours of the resulting designs, the processes of designing during search map well 
onto those of optimization. This sits well with our notion of genetic algorithms and 
genetic programming. They can be readily viewed as robust optimization methodolo- 
gies. Genetic algorithms and genetic programming have been used successfully as 
analogies of designing methodologies. 

In this paper we will briefly explore new approaches which can be drawn from na- 
ture and humans’ intervention in nature as possible sources for fruitful ideas on which 
to base evolutionary designing methodologies. We will look at four such approaches: 
genetic engineering, reverse engineering and the genetic analogy, developmental biol- 
ogy and a generalization of the crossover operation. 



2 Genetic Engineering in Designing 

The practice of genetic engineering in natural organisms involves locating genetic 
structures which are the likely cause of specified behaviours in the organism [1]. This 
provides a direct analog with finding significant concepts during the process of de- 
signing and giving them a specific primacy. The behaviour of the organism is an ob- 
servable regularity which maps onto the concept and the structure of the genetic mate- 
rial which causes that behaviour is a representation of that concept, albeit a represen- 
tation which has to be expressed in the organism for the concept to appear. The prac- 
tice of genetic engineering is akin to the reverse of synthesis in the sense that one 
aspect of an already synthesised design is converted into the means by which it could 
be generated. In fact it is more complex than that since it is the behaviour of the al- 
ready synthesised design which is the controlling factor but the analogy still holds. Let 
us examine in a little more detail the concept of genetic engineering [2] . 

Consider Figure 1 where the population of designs is divided into two groups (it 
could be more). One group exhibits a specific regularity whilst the other does not. The 
goal is to locate an “emergent” common structure in the genotypes of those designs 
which exhibit this regularity. Here “emergent” means that the structure was not inten- 
tionally placed there but could be found and represented for later use. Genetic engi- 
neering at this symbolic level uses pattern matching and sequence analysis techniques 
to locate these genetic structures. The process can be summarised as follows: 

• locate emergent properties in the behaviour (fitness) space 

• produce new genes which generate those emergent properties -> gene evo- 
lution 

• introduce evolved genes into gene pool. 
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genotypes 



Figure 1. Genetic engineering is concerned with locating groups of genes’ regularity, marked as 
X in the genotypes of those design which exhibit a specific behavioural regularity. 

These newly “evolved” genes capture some problem specific characteristics of the 
genetic representation of the good solutions to that problem. As such they may be able 
to be re-used in related problems to advantage. Typically each new problem to be 
solved using optimization techniques is treated anew without taking into account any- 
thing which has been learned from previous problems. Genes evolved using genetic 
engineered provide the basis for learning from previous design episodes and transfer- 
ring what has been learned to the current design problem. 



3 Reverse Engineering and the Genetic Analogy 

In the computational model of genetic engineering used in designing the evolved 
genes are complexes of the original genes. Even when they are mutated they remain 
complexes of the original genes. As a consequence the boundary of the state space of 
possible designs is unchanged so that the designs produced are no different to those 
which could have been produced using the original genes only. In order to produce 
novel designs, ie designs which could not have been produced using the original genes 
only, the evolved genes need to be different to simply being complexes of the original 
genes. In order to “evolve” such genes different processes are required. We can take 
ideas from reverse engineering in manufacturing and include them in the genetic anal- 
ogy [3]. 

The concept is analogically similar to that of genetic engineering in that emergent 
properties are looked for and new genes which generate those properties are produced, 
although the processes are different and the result is quite different. The process can 
be summarised as follows: 
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• locate emergent design (phenotype rather than fitness) properties 

• reverse engineer new genes which can generate those emergent properties 
^ gene evolution 

• introduce evolved genes into gene pool. 

The critical differences between this and genetic engineering occur in two places in 
this process. The first difference is in the locus of emergent properties - these are 
looked for in the phenotype, ie in the designs themselves rather than in their fitnesses 
or performances. The second difference is in the means by which “evolved” genes are 
created. 

Having located an emergent feature the next step is to reverse engineer a new gene 
which is capable of producing that emergent feature. This new “evolved” gene is then 
added to the gene pool. A variety of machine learning-based methods is available for 
this task. These include inductive substitution of the new representation in the place of 
the original representation in the design generator, turning constants into variables, 
and rule-based induction methods. 

Evolving genes by reverse engineering is a form of Lamarckism in that characteris- 
tics of an organism not directly produced by its genetic makeup are acquired by that 
organism’s genome. 



4 Developmental Biology and Designing 

Perhaps more interesting is to specifically model phenotypic plasticity to produce a 
form of pleiomorphism. This would allow for a form of genotype/phenotype environ- 
ment interaction during the development of the phenotype. A variety of environmental 
interactions can be proposed to allow for adaptive mapping between genotype and 
phenotype. Classes of interactions include the following where “f ’ is some function: 

• phenotype = f(genotype, situation), where situation refers to a state of the 
environment at some time, or 

• phenotypej. = f(genotype, phenotypCj._ 

both in lieu of : 

• phenotype = f(genotype). 

Example 1 

Here the phenotype is made up of components but the components themselves 
are some function of the path taken to reach that component. A simple path func- 
tion would be that each component is in some way a function of the components it 
is connected to, ie: 

• phenotype = { component j,... component p... component j^} 

• component j = f(component path[i-l,i]). 
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Example 2 

Here the phenotype is developed over some time intermediate periods from a 
given genotype, during which various intermediate fitnesses control its develop- 
ment in a pleiomorphic sense, ie: 

• phenotype = f(genotype, intermediate fitnesses during development) 

Example 3 

Here the phenotype, as it develops over some time intermediate periods from a 
given genotype, does so as a function of its expression at the previous time period. 
This is a crude model of cell division, ie: 

• phenotypej. = f(genotype, phenotypej. ]^). 

Models such as these provide opportunities to include both problem- and domain- 
specific knowledge in the evolutionary process. 



5 Generalizing Crossover as an Operator in Designing 

Crossover is one of the fundamental genetic operations in evolutionary systems, par- 
ticularly genetic algorithms and genetic programming. Formally, any genotype, g^, 
produced by a crossover operator from genotypes g, and g^ can be written as an inter- 
polation: 

g/t) = f(t)g,+ (l-f(t))g,,t = 0,l,...,n 

where I is a unit n-dimensional matrix with all diagonal elements equal to 1 and all 
other elements equal to 0, f (t) is the n-dimensional matrix obtained from the unit 
matrix by setting all diagonal elements from the t-th to the n-th to zero, f(0)=I and 
f(i)=o, where O is the n-dimensional zero matrix. 

From this characterisation the crossover operation can be viewed as a random sam- 
pling of interpolating genotypes between two basic points g, and g^ [4]. Note, that this 
linear matrix interpolation, which corresponds to the standard one-point crossover, is 
only one of many possible methods of interpolation between two genotypes in geno- 
typic space of the following form: 

g,(t) = c,(t)gjH- c,(n-t)g„ 

where operators Cj(t) and c^Cn-t) obey the condition C|(0)=I and c,(n)=0 and C2(0)=I 
and C2(n)=0. The crossover induced interpolation g/t) is singled out from many other 
possible interpolations gj(t) by the condition that the sum of the Hamming distances 
from g/t) to g| and to gj plus a penalty function (any kind of standard optimization 
penalty function will do) is to be optimized for two sequential coordinates in g/t) one 
of which coincides with the component of g, and the other which the component of g^. 
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Different versions of crossover can be constructed by choosing different conditions 
imposed on the interpolation points. 

Since each genotype corresponds to a unique phenotype, the crossover-induced in- 
terpolation operation between two genotypes maps onto an interpolation operation 
between two corresponding phenotypes p,=M(gj) and p2=M(gj) If p/t)=M(g/t)) for 
t=0,l,...,n and assuming that P is a linear space we can fit a path between p, and p^ 
and p^(t), using the following formula: 

Pc(t) = f (t)Pi+ a‘(n-t)p 2 , t=0,l, ...,n 

where f (t) and g‘(t) are operators which depend continuously on t. Since p^(0)= p^ 
and p/l)= P[, the weakest conditions these operators must satisfy are f(0)=I, f (0)=O 
and g‘(0)=I, f (0)=O (where I is the unit operator whose application to any phenotypes 
gives the same phenotype and O is the zero operator whose application to any pheno- 
type gives an empty phenotype). If we use any operators f(t) and g(t) which differ 
from f (t) and g‘(t) but still obey these conditions then this formula will produce inter- 
polation points which differ from those produced by standard genetic crossover. 

One way to conceive of this generalization of the crossover operator is to think of 
the standard crossover operator forcing interpolated results to lie on a surface in phe- 
notypic space P, defined by the bitstring representation of the genotype and the iso- 
morphic mapping between the genotype and the phenotype. Thus, any phenotypes 
which are a result of this crossover lie on a trajectory which is constrained to lie on 
this surface as indicated in Figure 2. The generalized crossover in the form of an in- 
terpolation generalizes P to P-t (which is a superspace with respect to P cz P-t). The 
generalized crossover consists of interpolating trial points directly in P-t using trial 
points from P as the end points of the interpolation. They are shown in Figure 2 with 
the dotted line. The expectation is that since these end points belong to the established 
search space P, the exploration due to interpolation in the enlarged P* will not distort 
the consistency and viability of the space P too much. The critical effect can be no- 
ticed in Figure 2, namely that the interpolation in P-t does not lie in P. In addition to 
interpolation we can now produce extrapolations, shown with arrows in Figure 2. 
These also lie outside P and in P-t. Hence, these interpolations have the capacity to 
produce designs outside the original state space. The interpolation process expands the 
state space of possible designs. 

This is significant in designing as it allows for the generation of novel designs, de- 
signs which could not been evolved using the standard crossover. Any genotypic rep- 
resentation can be reduced to a bitstring of length n. All possible genotypes lie within 
the space of 2” possible designs. Without either increasing the length of the genotype 
or introducing new members of the alphabet (beyond 0 and 1), it is not possible to 
expand the state space. The approach introduced here solves this problem by devel- 
oping a homomorphism between the phenotype and its genetic representation. It does 
away with the separate bitstring genotype representation, replacing it with this homo- 
morphism after the exploration process. The interpolation and extrapolation processes 
operate on the phenotype, changing it. As a consequence of this homomorphism a new 
genotypic representation is constructed each time exploration occurs. 
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A large number of possible interpolation functions may be used not all of which 
will produce viable results as there is a close connection between the useful interpola- 
tion functions and the representation employed. 

P+ 




Figure 2. The illustration of the crossover-induced interpolation in P and direct interpolation in 
enlarged space P+. The enlarged space P+ represents the complete 3-d space and the set P 
represents the surface in it. The solid line represents an interpolation in P, whilst the dotted line 
represents an interpolation in P+ [4]. 



6 Discussion 

The genetic engineering analogy has been applied in designing in a number of dispa- 
rate ways [5, 6, 7]. The reverse engineering approach has been used with emergent 
feature detection in state-space search to add to the alphabet of the genotype [8]. The 
generalization of crossover approach has been used to extend the state-space in de- 
signing [4]. 

The genetic analogy in designing has been based on a model of Darwin’s survival 
of the fittest. This has provided a foundation for a body of important work which has 
implicitly treated designing as a search method largely akin to optimization. The effect 
of this in designing terms has been to set up a fixed state-space which is then searched 
for appropriate solutions. Alternative analogies drawn from genetics, reverse engi- 
neering, developmental biology and alternate views of the crossover operation offer 
the opportunity to change the state-space of possible designs in some cases. In de- 
signing this ability to modify the state-space of possible designs in conceptually and 
practically important. 

Designing involves not just finding the best solution from a subset of possible solu- 
tions, it also involves determining what the possible solutions might be. A fixed search 
space does not allow for the exploration of possible solutions. Some of the novel 
methods described in this paper point to possible direction where such explorations 
may be modeled. 
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Abstract. Robustness has long been recognised as a critical issue for co- 
evolutionary learning. It has been achieved in a number of cases, though 
usually in domains which involve some form of non- determinism. We 
examine a deterministic domain - a pseudo real-time two-player game 
called Tron - and evolve a neural network player using a simple hill- 
climbing algorithm. The results call into question the importance of de- 
terminism as a requirement for successful co-evolutionary learning, and 
provide a good opportunity to examine the relative importance of other 
factors. 

Keywords: Co-evolution, Neural networks 



1 Introduction 

In 1982, Walt Disney Studios released a film called Tron which featured a game 
in a virtual world with two futuristic motorcycles running at constant speed, 
making only right angle turns and leaving solid wall trails behind them. As the 
game advanced, the arena filled with walls and eventually one opponent would 
die by crashing into a wall. This game became very popular and was subsequently 
implemented on many computers with varying rules, graphic interpretations and 
configurations. 

In earlier work we built an interactive version of Tron (using Java) and 
released it on the Internet. With this setup, we created a new type of co- 
evolutionary learning where one population consists of software agents controlled 
by evolving genetic programs (GP) [S| and the other population is comprised of 
human users. From a human factors standpoint, the fact that this simple game 
has attracted a large number of users and that many of them return to play 
multiple games is surprising and significant. From an evolutionary programming 
standpoint, the fact that the GP players have evolved to embody a robust set of 
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strategies, capable of overcoming a wide range of human behaviours, is notewor- 
thy. 

We have been studying co-evolutionary learning environments in several con- 
texts mm , trying to understand the reasons why this paradigm works very well 
for some tasks I6TT0ITT] but poorly for others. In particular, we have developed a 
“minimalist” co-evolutionary learning method that consists of a neural network 
which evolves using a simple hill-climbing algorithm. We have found this to be 
a useful means for studying the effect of co-evolutionary learning in various task 
domains. 

Previously, we have applied this method successfully to backgammon [Sj as 
well as a simulated robotic hockey game called Shock [T] . Tron is similar to these 
domains in some respects but differs in other, significant, aspects. Backgammon 
is a stochastic domain in which the outcome of each game is influenced by ran- 
dom dice rolls as well as choices made by the players. In the Shock domain, 
each game is started from a different random initial condition. Tron, on the 
other hand, is totally deterministic in the sense that two games played by the 
same two opponents will necessarily be identical. Since many authors have cited 
non-determinism as a critical factor in the success of co-evolutionary learning 
systems, particularly in relation to backgammon, we were interested to apply 
our hill-climbing procedure to a deterministic domain, hence Tron. 

This paper is organised as follows: we first describe the Tron implementation 
and the network architecture and algorithm. We then detail some of the experi- 
ments that were conducted to compare the neural network players with the GP 
players evolved in the Internet experiment. We conclude with some discussion 
and ideas for extending this work further. 

2 Tron 

Our interpretation of Tron abstracts the motorcycles and represents them only 
by their trails. Players may move past the edges of the screen and re-appear on 
the opposite side to create a wrap-around, or toroidal, game arena. The size of 
the arena is 256 x 256 pixels. Two players start at positions 128 pixels apart, 
heading in the same direction. One player is controlled by the computer (e.g. 
a GP), the other may also be controlled by the computer, or by a human user. 
The GP players are provided with 8 simple sensors with which to perceive their 
environment. 

Figure 1 Robot sensors. Each sensor evaluates the distance 
in pixels from the current position to the nearest obstacle 
in one particular direction. Every sensor returns a maximum 
value of 1.0 for an immediate obstacle (i.e. a wall in an adja- 
cent pixel) , a lower number for an obstacle further away, and 
0.0 when there are no obstacles in sight. 




The game runs in simulated real-time, where each player can select one of 
the following actions: left, right or straight. 
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In the Internet experiment, data has been collecting since September 1997 
and is still accumulating. Over 2500 users have logged into the system and played 
at least one game. The average number of games played by each human is 53 
games; the most games played by one player is 5028. Sixteen players have played 
over 1000 games. 



Tron gafiri0s wen/tetal {sampifeig rat»=IOOO) 




Figure 2 Internet Tron Results. 



Our basic measure of performance is the “win rate” - the percentage of games 
that the GP players have won in playing against humans. As shown in figure 2, 
this rate has steadily risen from approximately 30% initially to more than 60% 
over a period of several months, resulting in a robust GP population capable of 
beating a wide variety of opponents. This “database” of players, along with the 
Java Tron environment, provide an excellent resource for testing and comparison 
with other methods for training artificial players. 

3 Implementation and Results. 

In the present work, we develop Tron players controlled by two-layer feed-forward 
neural networks with 5 hidden units. Each network has 8 inputs - one for each 
of the sensors described earlier. There are 3 output units, representing each of 
the three possible actions (as above). Of these, the output unit with the largest 
activation determines the selected action for the current time step. 

We train the network using an evolutionary hill-climbing algorithm in which 
a champ neural network is challenged by a series of mutant networks until one 
is found that beats the champ; the champ’s weights are then adjusted in the 
direction of the mutant: 

1. mutant <— champ + gaussian noise 

2. mutant plays against champ 

3. if mutant beats champ, champ <— (1 — a)* champ -|- a * mutant 
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Using this neural network architecture, three players were evolved. Network 
nn-0 was evolved for 1200 generations, networks nn-1 and nn-2 for 50000 gener- 
ations each. The parameter a, which we refer to as the mutant influence factor, 
was set to 0.5 for nn-0 and 0.33 for nn-1 and nn-2. The network weights were 
saved every 100 generations and tested against five of the best GP players hand- 
picked from our Internet experiment (referred to as GP players 510006, 460003, 
480001, 540004 and 400010) Note that the GP players were used purely for 
diagnostic purposes and had no effect on the evolving networks. 




a. versus GP-510006 





d. versus GP-540004 



Figure 3 Network nn-1, generation 40500 (darker trail, starting on left) 
versus GP players (starting on right) 



Many of the GP players exhibit distinctive features, permitting loose charac- 
terisation of behaviours. For example, players GP-510006 and GP-460003 follow 
similar strategies of trying to fill the arena in a contracting spiral, first carving 
an outline and then gradually moving inward, attempting to reduce the area 
available to the opponent. They exhibit a consistent inter- line spacing of ap- 
proximately 12 and 4 pixels, respectively. When confined, both players seem to 

Note that this numbering is consistent with our previous papers on this work 0 ; 0 . 



1 
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“panic” , making a series of tight turns until either crashing or out-lasting their 
opponent. 

Player GP-480001 often performs a diagonal “coat-hanger” manoeuvre, turn- 
ing at angles of 45 or 135 by alternating left and right turns in rapid succession. 
Player GP-540004 is more aggressive, darting about the space in a seemingly er- 
ratic manner looking for opportunities to confine its opponent. Finally, player 
GP-400010 (shown in figure 4b) seems more defensive, gradually moving outward 
in a tight spiral pattern with an inter-line spacing of 1 or 2 pixels. 




a. generation 20000 vs GP-480001 




b. generation 10000 vs GP-400010 



Figure 4 Defensive strategies of nn-1 and GP players. 



The results of playing every 100th generation network against the five GP 
players are shown in figure 5, smoothed by aggregation. The performance of 
network nn-1 can be seen to improve gradually, peaking at around 70% after 
40000 generations. In particular, the network sampled at generation 40500 was 
able to beat all five GP players. 




Figure 5 Neural network results. 
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It is interesting to note that the neural network players do not seem to evolve 
distinctive features in quite the same way as the GP players (see figure 3). 

Figure 6 illustrates the evolution of network nn-1. Each game shown is 
against GP player 510006. The network makes early mistakes (a), but quickly 
learns a defensive strategy (b), then changes its behaviour (c), and finally wins 
again by “boxing in” its opponent (d). 
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Figure 6 Evolution of Network nn-1 (starting on left), 
versus GP-510006 (starting on right) 



Network nn-0 (not shown) developed a fragile defensive strategy similar to 
GP-400010, filling the screen as slowly as possible in a series of expanding spirals. 
This method works well against GP-400010, an opponent with a similar strategy. 
It also happens to beat GP-510006 consistently, but loses almost all the time to 
the other three players. 
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4 Discussion 

Co-evolutionary systems - particularly self-learning hill-climbers - often develop 
brittle strategies that perform well against a narrow range of opponents but 
are not robust enough to fend off strategies outside their area of specialisation. 
This brittleness has been overcome in a number of instances, but usually in 
domains that involve some form of non-determinism. Even though Tron is a 
deterministic domain, our self-learning hill-climbers have learned the task well 
enough to perform capably against a selection of GP players with a variety of 
different strategies. 

The fact that performance oscillates - as measured by our sample of 5 GP 
players (figure 5) - shows on the one hand that our NN representation for Tron 
players can be very effective: nn-1 at generation 40500 beats all the GP opponents. 
On the other hand, oscillations may indicate that the landscape is deceiving for 
our hill-climbing algorithm, i.e. going up in one sense may imply going down in 
another. Further experiments will help us explore these issues. 

It is interesting to note that nn-0, with a mutant influence factor of a = 0.5, 
developed a fragile strategy which plays an almost identical game against every 
opponent, while nn-1, with a = 0.33, developed an ability to react to different 
opponents in a robust manner. The practice of making only a small adjustment 
in the direction of the mutant - determined by the parameter a - was originally 
introduced in [9] on the assumption that most of the strategies of the well-tested 
champion would be preserved, with only limited influence from the mutant. 
However, it may also be that a lower value of a improves the robustness of the 
champion by exposing it to a greater variety of mutant challengers. Indeed, we 
conjecture that there may be an optimal value for a - which likely varies from 
one task to another. We plan to explore these issues in further experiments. 

In future work we intend to make more extensive studies of Tron and other 
domains, in the hope of gaining more insight into the role of non-determinism in 
co-evolutionary learning, and the relative importance of other factors. We also 
plan to make the neural network players available in the Tron Internet system. 
Look for them on our web site... http://www.demo.cs.brandeis.edu/tron. 
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Abstract. This paper describes a novel feature selection algorithm which 
utilizes a genetic algorithm to select a feature subset in conjunction with 
the weights for a three-layer feedforward network classifier. The algo- 
rithm was tested on the “ionosphere” data set from UC Irvine, and on 
an artifically generated data set. This approach produces results com- 
parable to those reported for other algorithms on the ionosphere data, 
but using fewer input features and a simpler neural network architec- 
ture. These results indicate that tailoring a neural network classifier to 
a specific subset of features has the potential to produce a classiher with 
low classihcation error and good generalizability. 

Keywords: Genetic algorithm; neural network; classification; iono- 

sphere 



1 Introduction 

Feature selection is the process of selecting an optimum subset of features from 
the much larger set of potentially useful features available in a given problem 
domain (!]■ The “optimum subset of features” which is the aim of the feature 
extraction algorithm can be defined as “the subset that performs the best under 
some classification system” , where “performs the best” is often interpreted 
as giving the lowest classification error. 

The feature selection problem has been investigated by many researchers, and 
a wide variety of approaches has been developed, including statistical approaches, 
decision trees, neural networks, and various stochastic algorithms (see [2] for an 
overview) . One approach which has received considerable recent attention is the 
use of genetic algorithms (GAs) for feature selection. 

A GA is an optimization tool inspired by biological evolution. A GA can find 
near-global optimal parameter values even for poorly behaved functions, given 
sufficient time. As such, its applicability to the optimum feature subset selection 
problem is apparent, and a number of authors have investigated the use of GAs 
for feature selection mm- 

The actual subset of features which is optimal for a given problem will depend 
upon the classifier used. In many real-world problems, classes are not linearly 
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separable, and a non-linear classifier, such as a neural net, is required. GAs have 
been used in combination with neural nets in several ways. They have been used 
to select feature sets for classification by a neural net trained using conventional 
learning algorithms (eg I6I7I 1. and to evolve the weight vector and/or architecture 
of a neural net (for a review see |8]). 

Incorporating the neural net classifier as part of the objective function of 
the GA requires training a neural net for each fitness evaluation performed by 
the GA. Due to the stochastic element of neural net training, each net should 
be trained several times with different initial weights in order to properly assess 
performance; at least 30 repeats has been suggested in [^. This makes this 
approach computationally prohibitively expensive. The solution has usually been 
to use a simpler, related classifier in the GA [^, or to only partially train a subset 
of the neural nets in each generation m- 

A different approach, described below, is to combine the evolution of the 
weight vector for a neural net with the selection of a feature subset. In this way 
it should be possible to use a GA to evolve a nonlinear classifier explicitly tuned 
to a feature subset, without excessive computational overhead. This combines 
the advantages of using a small feature set, as discussed above, with those of 
having a classifier developed specifically for that feature set, and should lead to 
high accuracy of classification using a small number of features. We decided to 
see whether such an approach is feasible and whether it could, in fact, lead to 
the development of an effective, generalizable classifier. 

2 Method 

The task chosen to test the algorithm was classification of the “ionosphere” data 
set from the UGI Repository of Machine Learning Databases m- The dataset 
consists of radar returns from the ionosphere. There are 350 returns, consisting 
of 34 features, normalized to the range [-1 , 1], with a target value of “good” 
or “bad”, which was recoded for this application as “0” and “1” respectively. 
The features were originally 17 discrete returns, each comprising a real and an 
imaginary part. These parts are used as separate features, leading to the final 
array of 34 features. The data was initially divided into a training set of 200 
returns, of which 100 were good and 100 bad, and a test set of 150 returns, of 
which 123 were good and 27 were bad. This is the partitioning used by m, who 
first collected and analyzed the data. 

The distribution of good/bad returns in the test set is very uneven . To assess 
the effect of the partitioning of the data on the classifier developed, a further 5 
data partitionings were used. In each case 63 good and 63 bad cases were selected 
at random from the data set and these 126 cases were used as the training set, 
with the remaining 124 forming as the test set. This results in a smaller training 
set, but a test set with a somewhat greater proportion of bad returns. 

A three-layer feedforward net was used, with a sigmoid activation function 
on the hidden and output units. The network had six input units, three hidden 
units, and a single output unit. The GA could thus select a set of 6 features, or, by 
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setting both the weights from a feature to 0, could select less than 6 features. This 
choice of network architecture was a compromise between an attempt to make 
the network as small as possible, reducing the number of parameters to be fitted, 
and thus the complexity of the problem which the GA is to solve, and the need to 
make it complex enough to successfully classify the nonlinear input data. Three 
hidden units were used because they proved to be sufficient to solve the problem 
— experiments with different network architectures indicate that more hidden 
units, as would be expected, produce a more accurate classifier. However, since 
the objective of this study was to demonstrate that a very simple classifier can 
perform well if its inputs are carefully chosen, the simplest successful architecture 
was used. 

Each net was encoded as a single binary string 160 bits long, with each eight 
bits representing a single integer in the range 0-255 (binary coded decimal). 
Alternative representations, such as Gray coding, may have advantages in a 
GA, but were not considered in this study. 

Each integer represented either a feature or a weight. The first eight bits 
represented the first feature, feature 1; the next eight the weight from feature 1 
to hidden unit 1; the next eight the weight from feature 1 to hidden unit 2, and 
so on. The weights associated with a given feature were located close to that 
feature to facilitate the development of schema during evolution. Features and 
weights were both coded in eight bits, so that they were both subject to the 
same chance of mutation and crossover. 

Integers representing features were rescaled to lie between 0 and 3 by multi- 
plying by (3/255). The result was rounded, and this integer was interpreted as 
an index to a particular feature. Integers representing weights were scaled to lie 
in the range —3.0 to -1-3.0, using w = {kx{2x 3.0) /255) — 3.0. This weight range 
was chosen because it is fairly small and centred around 0.0, which is the centre 
of the sigmoid activation function used. 

The objective function for the GA attempts to minimize the mean squared 
error of the net over the entire training data set. The fitness of an individual 
chromosome is (1 — MSE). 

The parameter settings used for the GA were achieved by trial-and-error on 
the training set only. The initial values were taken from m- The final con- 
figuration was: Population Size 50; Mutation Rate 0.01; Grossover Rate 0.02; 
Generation Gap 0.5. 

Individuals were chosen to reproduce on the basis of their fitness, using 
Roulette Wheel selection m and elitist generational replacement was used, 
with the fittest 50% of parents and the fittest 50% of offspring making up the 
next generation. 

The GA was run five times on the Sigillito-partitioned training dataset, with 
a different random number seed each time. The course of training was followed 
by recording the maximum fitness observed in each generation, and training was 
stopped when the population appeared to have converged. The fittest individual 
in the population was then decoded and used to classify the test and training 
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sets. The GA was then run on each of the 5 randomly partitioned data sets, with 
a random number seed of 1 each time, using the same training methodology. 



3 Results and Analysis 

The results of the training runs are recorded in Table HI For each trial, the fittest 
neural net was used to classify each case in both the training and the test data 
sets. The optimum threshold for each classifier was selected and the percentage 
of cases correctly classified using that threshold are recorded in Table Q] 



Table 1. Runs of the Neural GA with different Random Number Seeds 



Seed 


Max 

Fitness 


Gener- 

ations 


Features 

Selected 


AUG 

(Train) 


AUG 

(Test) 


% Correct 
(Train) 


% Correct 
(Test) 


1 


0.901581 


687 


4,5,7,20,26,28 


0.892 


0.902 


90.5 


90.8 


345 


0.861244 


688 


4,5,9,9,15,23 


0.901 


0.808 


88.6 


90.8 


7356 


0.892623 


609 


2,4,5,12,13,14, 


0.898 


0.947 


90.0 


93.4 


629 


0.899525 


691 


2,4,5,7,9,13, 


0.941 


0.976 


90.0 


96.1 


30 


0.894945 


677 


4,7,8,9,20,23 


0.927 


0.859 


86.6 


90.8 


Ave 


0.889984 










89.1 


92.4 



The average percentage of cases in the unseen test set classified correctly is 
92.4% — somewhat higher than that achieved for the training set, at 89.1%. This 
implies that the test set is easier to classify than the training set, a conclusion 
which is supported by , who note that bad returns are more diverse than 
good ones, and hence presumably harder to characterize. Bad returns comprise 
only 18% of the test set, but form 50% of the training set. When different, and 
more even, partitionings of the data into training and test sets were used, the 
average area under the ROC curve on the training set was 0.979, and on the 
test set was 0.937. This supports the premise that the good returns are easier to 
classify, and suggests that conclusions about the generalizability of any classifier 
developed on this data set are limited by this variability. 

The accuracy of the nets produced by different runs varied from 86.6% to 
90.5% for the training set, and from 90.8% to 96.1% for the test set, indicating 
that the system is quite robust with respect to initial conditions. 

A Receiver Operating Characteristic (ROC) curve was constructed for each 
set by taking the output of each case in each run and thresholding the entire data 
set at a number of points between 0 and 1. The proportion of correctly classified 
“good” values (true positives) and incorrectly classified “bad” values (false pos- 
itives) was computed for each threshold, and these values plotted against each 
other to give the ROC curves (Fig.[I|. A ROC curve provides a concise graphic 
depiction of the overall performance of a classifier; for any given classifier it is 
possible to operate at any point on the ROC curve by using the corresponding 
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threshold for classification. The area under the ROC curve may be used as a 
measure of the power of the classifier. It ranges from 0.5 (no power - random 
classification) to 1.0 (perfect classification of all cases). The columns labelled 
“AUC” in Table [T]is the area under the ROC curve for that classifier. 

The ROC curves in Fig.[T]vary more than the single figure for accuracy would 
indicate, implying that, although the optimum performance of the classifiers is 
similar, the overall performance is not so uniform — some feature subsets appear 
to be more effective than others over a wide range of thresholds. 

Fig. m also shows the results achieved by Sigillito[I2] on this data set. They 
used a linear perceptron, which achieved an accuracy of 90.7%, a true positive 
value of 95.9% and a false positive value of 33.3%; a nonlinear perceptron, which 
achieved an accuracy of 92.0%, a true positive value of 98.4% and a false positive 
value of 37.0%; and a number of multilayer perceptrons (MLPs) with 3-15 hidden 
units. The best of these achieved 98% accuracy, with a true positive value of 100% 
and a false positive rate of 11.1%. 




Fig. 1. ROC Curves for 6 Input Neural GA Classifier. A, B and C mark the results 
obtained by Sigillito et aZ.(1989) on this data set with varying classifiers (see text for 
details) 
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Fig. [T] illustrates the difference between the point accuracy of a classifier, and 
its overall performance. While the accuracy achieved by [12] using an MLP with 
15 hidden nodes and 34 inputs was considerably better than that achieved by 
the Neural GA at what we selected as “optimum” threshold, all their classifiers 
are operating within the range of the ROC curves for the neural GA. That is, 
this classifier could operate with equal percentage accuracy with the appropriate 
tradeoff between true positive and false positive rates. 

The variability observed due to different partitionings of the data set makes it 
difficult to quantify the generalizability of the classifier evolved by this algorithm. 
In order to overcome this problem, the algorithm was tested on an artificially 
constructed data set. This consisted of points from a set of multidimensional 
nested spheres. This problem is clearly non-linearly separable, but is readily 
visualized by humans. The data set consisted of a two-dimensional, two-class 
problem. 1,000 points from each class were generated as a training set, a total of 
2,000 training cases. An equal number of points was generated for the test set. 
The spheres are centred on 0.5, with the inner sphere having a radius of 0.25, 
and the outer forming an annulus of thickness 0.25 around the inner sphere. 

Seven “features” were generated. Features 0 and 1 are the x and y coordinates 
for the data point. Features 2 ind 5 are random numbers in the range (0,1); 
feature 3 is half of (feature 0 plus a random number in the range (0,1)); feature 
4 is 2/3 of (feature 0 plus feature 1); and feature 6 is half of feature 2 plus feature 
5. All feature values are in the range (0,1). The optimum subset of features for 
this data set is thus (0,1). The neural network consisted of two input nodes, five 
hidden nodes and a single output node. 

Over the course of several runs it became apparent that there was a strong 
tendency for the GA to converge to either (0,0) or (1,1) as the features selected. 
These solutions apparently represent local minima in the search space in which 
the system tended to become trapped, although the correct solution was found 
occasionally. In order to overcome this, a penalty function was introduced into 
the fitness function, whereby the SSE for an individual was multiplied by 1.1 for 
each duplicate feature selected. 

The results of five runs of the algorithm on the artificial data are recorded 
in Table El 



Table 2. Runs of the Neural GA on Artificial Data 



Run 


Seed 


Features 

Selected 


AUC 

(Train) 


AUC 

(Test) 


1 


1 


1,4 


0.999 


0.803 


2 


1234 


1,4 


1.000 


0.999 


3 


999 


0,1 


1.000 


1.000 


4 


5869 


0,1 


1.000 


1.000 


5 


65 


0,1 


1.000 


1.000 
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In three of the five runs the correct solution was found. On the other two runs 
the solution found was (1,4); feature 4 is a combination of features 0 and 1, and 
so provides useful, although noisier, information to the classifier. The classifiers 
based upon the correct features generalized perfectly to unseen test data, while 
those based on the noisier features generalized less well. 



4 Conclusions 

The results achieved with the neural genetic algorithm described above on real 
data are encouraging, in that they demonstrate that a simple nonlinear classifier, 
tailored to a feature subset, can perform almost as well as a much more complex 
classifier utilizing six times as many input features. There is some evidence that 
many of the features in this data set are not contributing to the true classifica- 
tion, but the “more accurate” classifier is actually reflecting idiosyncrasies of the 
training data set. This is not surprising, given that a three-layer neural net with 
20 inputs and three hidden units is attempting to estimate 63 parameters using 
200 training exemplars, whereas a 6-input net is fitting only 21 parameters. 

On an artifical data set with large amounts of data, the algorithm produces 
a classifier which selects the most discriminatory features three times out of five, 
and generalizes well to unseen test data. This suggests that the algorithm can 
combine feature selection and classifier construction within the limits of the data 
set. 

Feature selection techniques are often applied to data sets having large num- 
bers of features, but relatively few cases. Division of data into training and test 
(and preferably validation) sets, while essential, further aggravates the situa- 
tion. In such a situation, the use of a subset of features is highly likely to lead to 
improved generalizability of a classifier. The algorithm described here permits 
network architecture to be kept simple, but strongly tailored to a feature subset, 
to reduce computation and enhance generalizability of the resulting classifier. 
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Abstract. This paper reports on an evolutionary programming based 
method for solving the optimal power flow problem. The method in- 
corporates an evolutionary programming based load flow solution. To 
demonstrate the global optimisation power of the new method it is ap- 
plied to the IEEE30 bus test system with highly non-linear generator 
input/output cost curves and the results compared to those obtained us- 
ing the method of steepest descent. The results demonstrate that the new 
method shows great promise for solving the optimal power flow problem 
when it contains highly non-linear devices. 

Keywords: Evolutionary programming, Optimal power flow, Optimisa- 
tion 



1 Introduction 

Recently attempts have been made by power system researchers to develop Evo- 
lutionary Computation (EC) based optimisation techniques for solving power 
system problems. EC is the study of computational systems, which use ideas 
and get inspiration from natural evolution and adaption. Currently there are 
three major implementations of the evolutionary algorithms: genetic algorithms 
(GAs) evolutionary programming (EP) jaHj and evolution strategies m- 

In the last five years, the first two EC approaches have been applied to may 
operating and planning problems in power systems. The GA techniques have 
been used in the reconfiguration of radial distribution networks, load-flow [Zj, 
economic active power and fuel dispatch [HW, hydrothermal scheduling m, 
unit commitment [HH] and transmission systems P. The EP approach has 
in the last two years gained some momentum and has been applied to economic 
dispatch economic/environmental dispatch |l 6], reactive power planning 
jnl and transmission network expansion planning |1R] . 

The works reported in the literature so far have confirmed that, as an opti- 
misation methodology, EC has global search characteristics and it is flexible and 
adaptive. Depending on the problem class, it can be very robust. For example, 
the constrained genetic algorithm based load flow algorithm jZ] has been shown 
to be robust and has the ability to And the saddle node bifurcation point of 

X. Yao et al. (Eds.): SEAL’98, LNCS 1585, pp. 405- |412| 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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extremely loaded practical systems and abnormal load flow solutions. Its perfor- 
mance is superior to the conventional Newton Raphson method in these aspects. 

The work on GA load flow has been extended and implemented using the 
EP methodology. Based on the work so far developed, this paper reports on the 
development of a pure EP based method for solving the optimum power flow 
(OPE) problem |19I20I21| . which merges the load flow and economic dispatch 
problems into one. The complexity of the problem is increased due to a larger 
set of operational constraints of the generators and transformers in the electrical 
networks. Moreover, highly nonlinear generator input/output cost characteris- 
tics will increase the complexity of the OPE problem rendering conventional ap- 
proaches ineffective in obtaining the global optimum solution. An OPE method 
based on EC such as the one in this paper will provide a sound basis for further 
development, not only to reduce the computational time requirement of the EP- 
based OPE method, but to include non-linear devices such as FACTs devices. 
The OPE evaluations become more and more important under the deregula- 
tion of the electricity industry as OPE is an essential component in any power 
transmission costing and pricing calculations. 

This paper reports a pure EP-based method for solving the OPE problem. 
The developed method is tested and validated using the IEEE 30-bus test sys- 
tem. Two study cases are presented in which the generator cost characteristics 
are represented by (a) quadratic functions and (b) by mixed quadratic functions 
and piece- wise quadratic functions. The convergence characteristics of the new 
method in the application studies are presented. The method is shown to be 
powerful and promising. 

2 Optimum Power Flow Problem 

The OPE problem is a combination of the load flow and economic dispatch prob- 
lem. The objective of this problem can be stated differently depending on the 
aspect of interest. One of the possible objectives of OPE is the minimisation of 
the power generation cost subject to the satisfaction of the generation and load 
balance in the transmission network as well as the operational limits and con- 
straints of the generators and the transformers. The OPE problem can therefore 
be regarded as a constrained minimisation problem which, in general, has the 

following formulation: ■ n/ \ 

min/(x,u) (I) 

subject to: ^(x, u) = 0 

h(x, u) < 0 

where /(x, u) is the objective function in terms of the power production cost, 
which depends on the generations levels of the generators. The vector of inde- 
pendent variables u is given by the active powers of the generators, the voltages 
of the PV nodes and transformer tap settings. The vector of dependent variables 
X is given by the voltages of PQ nodes, argument of PV nodes voltages and re- 
active power generation. The equality constraint in equation set [T]represents the 
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balance of supply and load at each node in the network, that is the load flow 
problem. The inequality constraints represent operational limits of the genera- 
tors and the tap settings of transformers. In solving the OPF problem, it can be 
seen that the load flow problem must also be solved. In the present work, an EP 
based load flow method similar to that in [ij is employed. 

3 Solving OPF using Evolutionary Programming 

The essence of the EP technique can be found in [4ji,»inmi7n^ . The essential 
components of the pure EP-based OPF algorithm are given below. Based on 
these components, an EP based procedure can be established for solving the 
OPF problem. The procedure is illustrated by the flow-chart in Fig. HI 




Fig. 1. Flow Chart of EP-based OPF 

(a) Representation of Individuals: An individual or candidate solution in a pop- 
ulation is represented in an array in which the values of the generator active 
power powers, nodal voltages and transformer tappings are stored. Slack 
node power is not included in the individual. The active powers of the gen- 
erators, nodal voltages and transformer tap settings in an individual in the 
initial population are set randomly within their given ranges. New individ- 
uals are formed by mutation. 

(b) Fitness function: The fitness fi of individual i, that is the degree of optimal- 
ity of the candidate solution is evaluated by the following fitness function: 



a + Y, 

3 
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K^{Vj - 1.0)2 if Vj > or V, < y™" 

0 otherwise 

K.iQslaek - QTack? if Qslack > QTack 
K,{Q.lack - Q^Zkf if Qslack < QTlTck 
0 otherwise 

In equation (I2D, Ci is the total cost of active power generation in individual 
i, represents a penalty applied for any voltage violations, while SQ is 

a penalty for reactive power violations at the slack node. As the production 
cost is usually very high, its reciprocal will normally be very small. To ob- 
tain better numerical values for the fitness of the individual for comparison 
purposes, the factor M is used in equation © to amplify the value. M is 
here set to the maximum possible cost of power generation. Ky and Kq are 
penalty weighting constants. 

(c) Generation of Candidate Solutions: New individuals are produced by mu- 
tating the existing individuals. Let be the new individual produced from 
old individual pi according to: 

Xji = xp + N{0, (j]f) (3) 

where and Xji are the values of the element in p^ and pi respec- 
tively and N{0,ajj^) is a gaussian random number with a mean of zero and 
a standard deviation of aji. The expression designed for aji is: 

- xfnUfma. - f^)/fma. + a’') (4) 

where fi is the fitness of individual z; fmax is the maximum fitness within 
the population; denote the upper and lower limits of variable j] a 

is a positive constant slightly less than unity and r is the iteration counter. 
The term o’" provides a decaying mutation offset the rate of which depends 
on the value of a m- 

(d) Selection of a new population by competition: A resultant population of indi- 
viduals is formed from the two existing populations. Each of the individuals 
in the two populations will compete with Nt rival individuals selected ran- 
domly in the combined populations and score Si will be assigned to the 
individual z, according to: 



yp^ = 

SQq = 



Nt 

Si = 



Hi = 



i=i 

1 

0 



if fi > fr 

otherwise 



( 5 ) 



where Uj is the result of a tournament between individuals z and r, fi and 
fr are the fitnesses of the individuals under consideration. If the population 
size is fc, then the k highest ranked individuals will be selected to form the 
new population form which future generations are evolved. 
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4 Application Example 

The results obtained when the EP-based method is applied to the IEEE 30 
bus system are presented in this section. The test system data can be found in 
m- Two study cases are presented, the first case is taken from and uses 
quadratic generator input/output cost curves which provides a convex solution 
surface. The second study replaces the generator input /output cost curves for 
nodes 1 and 2 by piece-wise quadratic curves to simulate different fuels or valve- 
point loading effects. 

All simulations were run on a Pentium Pro 200Mhz computer, the algorithm 
was written in the C programming language. In all cases the average execution 
time was 38 seconds. Reactive power limits at all nodes except the slack node 
are enforced using conventional switching within the load flow. The population 
size is set at 20 while in all cases 50 iterations are executed. 

(i) Quadratic Input/Output Curves 

In this study quadratic curves were used to describe the generators. This 
provides a convex solution surface which is well suited to conventional op- 
timisation techniques such as the method of steepest descent. The data for 
the generators is given below in Table [T] The EP-OPF was run 100 times 
and the cost of the final solution in each of the trials is graphed below in Fig. 
2. The costs of solutions produced are consistently close to that reported in 
[19j . The minimum cost being $802.86 while the average was $804.42. The 
details of the minimum solution are provided in Table |3] 

To illustrate the convergence of the EP-OPF the average statistics over the 
100 trials are plotted in Fig. 3. It can be seen that the EP-OPF converges 
quite rapidly to the global optimum solution. 

The problem was also solved using the SD method of [l^ and its convergence 
is shown on Fig. 3. The SD method performs well on this case as expected, 
due to the convex nature of the generator input output curves. The final 
solution returned by the SD method is approximately $802.40. 



Table 1. Unit Input /Output Curves for Case(i) 



Bus 


prmn 


pmax 


Qmin 


nmax 


Cost Coefficients 


No. 


MW 


MW 


MVAr 


MVA 


a 


b 


c 


1 


50 


200 


-20 


250 


0.00 


2.00 


0.00375 


2 


20 


80 


-20 


100 


0.00 


1.75 


0.01750 


5 


15 


50 


-15 


80 


0.00 


1.00 


0.06250 


8 


10 


35 


-15 


60 


0.00 


3.25 


0.00834 


11 


10 


30 


-10 


50 


O.UU 


3.00 


0.02500 


13 


12 


40 


-15 


60 


0.00 


3.00 


0.02500 



Ci=ai+ biPi + CiP/ 
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Fig. 2 Solutions for Case(i) 



Fig. 3 Convergence for Case(i) 



(ii) Piece-wise Quadratic Curves 

To simulate the effects of different fuels or valve point loading, the curves 
describing the generators connected to nodes 1 and 2 were replaced by piece- 
wise quadratics. The data for these curves is given in Table |2] These curves 
provide a non-convex solution surface for the problem which will cause more 
classical solution methods to fail in determining the global optimum. 

The algorithm was run 100 times producing a minimum cost of $648.95 and 
an average cost of $654.81, the data for the minimum solution is given in 
Table E] The final costs for the 100 trails are plotted below in Fig. 4. The 
convergence of the algorithm is also plotted in Fig. 5. 

The SD method of [I2j was applied to this case also, the convergence is 
given in Fig. 5. In this case the method fails to find the global optimum 
solution. With reference to Fig. 5 the jump in cost at iteration 23 is a result 
of the loading of the unit connected to bus 2 crossing a discontinuity. The 
gradient information on which the method is based becomes invalid when 
a discontinuity is crossed and results in the solution converging to a local 
optimum. 




Fig. 4 Solutions for Case(ii) 



Fig. 5 Convergence for Case(ii) 
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Table 2. Unit Input/Output Curves for Case(ii) 



Node 

No. 


From 

MW 


To 

MW 


Cost Coefficients! 


a 


6 


c 


1 


50 


140 


55.0 


0.70 


0.0050 


140 


200 


82.5 


1.05 


0.0075 


2 


20 


55 


40.0 


0.30 


0.0100 


55 


80 


80.0 


0.60 


0.0200 



Ci = ai + biPi + aPf 



Table 3. Solutions for Test Cases 



Case 


P2 


P3 


P 4 


P 5 


Pe 


Pi 


P 2 


P 3 


P 4 


Ps 


Pe 


^11 


tl 2 


^15 


*36 


EP (i) 


46.903 


21.210 


25.604 


12.805 


12.145 


1.048 


1.035 


1.007 


11.006 


1.095 


1.068 


1.04 


0.90 


0.98 


0.93 


SD (i) 


52.028 


21.095 


19.384 


13.540 


12.734 


1.050 


1.041 


1.007 


0^ 

0 

0 


1.073 


1.054 


0.98 


0.94 


0.94 


0.93 


EP (ii) 


54.936 


24.567 


33.917 


18.630 


18.709 


1.046 


1.057 


1.038 


1.051 


1.025 


1.068 


0.91 


1.09 


0.95 


1.02 


SD (ii) 


71.787 


14.798 


11.669 


0 

b 

0 

0 


0 

0 

q 


1.006 


1.037 


1.050 


1.010 


1.077 


1.056 


0.98 


0.95 


0.94 


0.93 



5 Conclusions 

An EP-based method for solving the optimal power flow problem has been re- 
ported and demonstrated through its application to the IEEE30 bus test system. 
It has been compared to the steepest descent method and has been found to ob- 
tain almost identical results in the case where the generator input/output cost 
curves are quadratic. The new method is however superior when the generator 
cost characteristics are highly non-linear. The limitations of the new method are 
(i) that it is not robust enough to guarantee convergence to the global optimum 
solution in the case of piece-wise quadratic cost functions and (ii) the computa- 
tional speed is large compared to classical methods. 

The EP-based OPE method reported is very promising. Further work is being 
undertaken to improve the robustness and to reduce its computational require- 
ment. 



References 

1. J.H. Holland. Adaption in Natural and Artificial Systems. Ann Arbor: University 
of Michigan Press, 1975. 

2. D.E. Goldberg. Genetic Algorithms in Search, Optimisation and Machine Learn- 
ing. Addison- Wesley, 1989. 

3. L.J. Fogel. Autonomous automata. In Ind. Res., volume 4, pages 14-19, 1962. 

4. D.B. Fogel. Evolutionary Computation: Toward a new Philosophy in Machine In- 
telligence. IEEE Press, 1995. 

5. I. Rechenberg. Evolutions strategie: Optimierung technischer systeme nach prinzip- 
ien der biologisehen evolution. Germany Frommann-Holzboog, 1973. 

6. H.P. Schwefel. Evolution and Optimum Seeking. Wiley, New York, 1995. 

7. K.P. Wong, A. Li, and M.Y. Law. Development of constrained genetic algorithm 
load flow method. lEE Proc. Gener. Transm. and Distrib., 144(2) :91-99, 1997. 





412 Kit Po Wong and Jason Yuryevich 



8. D.C. Walter and G.B. Sheble. Genetic algorithm solution of economic dispatch 
with valve point loading. In IEEE PES Summer Meeting, Seattle, Paper Number 
SM 414-3 PWRS, 1992. 

9. K.P. Wong and Y.W. Wong. Genetic and genetic simulated-annealing appraoches 
to economic dispatch. lEE Proc. Gener. Transm. Distrib., 141(5):507-513, 1994. 

10. K.P. Wong and Wong S.Y.W. Hybrid genetic/simulated annealing approach 
to short-term multiple-fuel-constrained generation scheduling. IEEE Trans, on 
Power Systems, 12(2):776-784, 1997. 

11. K.P. Wong and Y.W. Wong. Development of hybrid optimisation techniques based 
on genetic algorithms and simulated annealing. In X Yao, editor, Progress in 
Evolutionary Computation, Lectures in Artificial Intelligence, pages 372-380. 956 
Series by Springer- Verlag, 1995. 

12. K.P. Wong and Y.W Wong. Thermal generator scheduling using hybrid 
genetic/simulated-annealing approach. lEE Proc. Gener. Transm. Distrib., 
142(4):372-380, 1995. 

13. S.A. Kazarlis, A.G. Bakirtzis, and V. Petrdis. A genetic algorithm solution to the 
unit commitment problem. IEEE Trans, on Power Systems, ll(l):372-380, 1995. 

14. H. Rudnick, R. Palma, E. Cura, and C. Silva. Economically adapted transmission 
systems in open access schemes - application of genetic algorithms. IEEE Trans, 
on Power Systems, 11(3), 1996. 

15. H.T Yang, P.C. Yang, and C.L. Huang. Evolutionary programming based eco- 
nomic dispatch for units with non-smooth fuel cost functions. IEEE Trans, on 
Power Systems, 11(1):112-117, 1996. 

16. K.P. Wong and J Yuryevich. Evolutionary programming-based economic dispatch 
for environmentally constrained economic dispatch, accepted in 1997 for publica- 
tion in IEEE Trans, on Power Systems. 

17. L.L. Lai and J.T. Ma. Application of evolutionary programming to reactive power 
planning - comparison with non-linear programming approach. IEEE Trans, on 
Power Systems, 12(1), 1997. 

18. L.L. Lai, T.J. Ma, Wong K.P., R. Yokoyama, M Zhao, and H. Sasaki. Application 
of evolutionary programming to transmission system planning. In Conf. Proc. on 
Power Systems, Institution of Electrical Engineers Japan, pages 147-152, 1996. 

19. O. Alsac and B. Stott. Optimal loadflow with steady-state security. IEEE Trans., 
PAS-93:745-751, 1974. 

20. R. Ristanovic. Successive linear programming based opf solution. In Optimal 
Power Flow: Solution Techniques, Requirements and Challenges, pages 1-9. IEEE 
Power Engineering Society, 1996. 

21. S.M. Shahidehpour and V.G. Ramesh. Non-linear programming algorithms and 
decomposition strategies for opf. In Optimal Power Flow: Solution Techniques, 
Requirements and Challenges, pages 10-24. IEEE Power Engineering Society, 1996. 




Grammatical Development of Evolutionary 
Modular Neural Networks 



Sung-Bae Cho^^ and Katsunori Shimohara^ 



^ Dept, of Computer Science, Yonsei University 
134 Shinchon-dong, Sudaemoon-ku, Seoul 120-749, Korea 
^ ATR Human Information Processing Research Laboratories 
2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-02, Japan 
E-mail: [sbcho ,katsu] Ship . atr . co . jp 



Abstract. Evolutionary algorithms have shown a great potential to de- 
velop the optimal neural networks that can change the architectures and 
learning rules according to the environments. In order to boost up the 
scalability and utilization, grammatical development has been consid- 
ered as a promising encoding scheme of the network architecture in the 
evolutionary process. This paper presents a preliminary result to apply 
a grammatical development method called L-system to determine the 
structure of a modular neural network that was previously proposed by 
the authors. Simulation result with the recognition problem of handwrit- 
ten digits indicates that the evolved neural network has reproduced some 
of the characteristics of natural visual system, such as the organization 
of coarse and fine processing of stimuli in separate pathways. 



1 Introduction 

There are more than hundred publications that report an evolutionary design 
method of neural networks mm- One of the important advantages of evo- 
lutionary neural networks is their adaptability to a dynamic environment, and 
this adaptive process is achieved through the evolution of connection weights, 
architectures and learning rules [4]. Most of the previous evolutionary neural 
networks, however, show little structural constraints. However, there is a large 
body of neuropsychological evidence showing that the human information pro- 
cessing system consists of modules, which are subdivisions in identifiable parts, 
each with its own purpose or function. 

This paper takes a module as a building block for evolutionary neural net- 
works previously proposed by |S], and applies a parametric L-system to the 
development of the network architecture. Each module has the ability to au- 
tonomously categorize input activation patterns into discrete categories, and 
representations are distributed over modules rather than over individual nodes. 
Among the general principles are modularity, locality, self-induced noise, and 
self-induced learning. 
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and Technology in Korea. 
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-► COMMUNICATION 



-► CONTROL 



EXCITATORY 



INHIBITORY 



(a) 




(b) 



Fig. 1. (a) Schematic diagram of the internal structure of a module; (b) Simplified 
diagram of the module (a). 

2 Evolutionary Modular Neural Networks 

The basic idea is to consider a module as a building block resulting in local rep- 
resentations by competition, and develop complex intermodule connections with 
evolutionary mechanism. In computing terms, an evolutionary algorithm maps 
a problem onto a set of strings, each string representing a potential solution. 
In the problem at hand, a string encodes the network architecture and learning 
parameters in tree structure. The evolutionary algorithm then manipulates the 
most promising strings in its search for improved solutions. This process operates 
through a simple cycle of stages: 

1. creation of a population of tree-structured strings, 

2. evaluation of each string, 

3. selection of good strings, and 

4. genetic manipulation to create the new population of strings. 

The activation value of each node in the modular neural network is calculated 
as follows: 



where Wij denotes the weight of a connection from node j to node i. The effective 
input to node i, e,, is the weighted sum of the individual activations of all 
nodes connected to the input side of the node. The input may be either positive 
(excitatory) or negative (inhibitory). 

The internal structure of each module is fixed and the weights of all intramod- 
ular connections are non-modifiable during learning process (see Fig.[T](a)). In a 
module, R-node represents a particular pattern of input activations to a mod- 
ule, V-node inhibits all other nodes in a module, A-node activates a positive 




( 1 ) 



3 
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function of the amount of competition in a module, and E-node activation is a 
measure of the level of competition going on in a module. The most important 
feature of a module is to autonomously categorize input activation patterns into 
discrete categories, which is facilitated as the association of an input pattern 
with a unique R-node. 

The process goes with the resolution of a winner-take-all competition be- 
tween all R-nodes activated by input. In the first presentation of a pattern to a 
module, all R-nodes are activated equally, which results in a state of maximal 
competition. It is resolved by the inhibitory V-nodes and a state-dependent noise 
mechanism. The noise is proportional to the amount of competition, as measured 
through the number of active R-nodes by the A-node and E-node. Evolutionary 
mechanism gives a possibility of change to the phenotype of a module through 
the genetic operators. 

The interconnection between two modules means that all R-nodes in one 
module are connected to all R-nodes in the other module. These intermodule 
connections are modifiable by Hebb rule with the following equation: 



where Oi, aj and a/ are activations of the corresponding nodes, respectively: 
Wij(t) is the interweight between R-nodes j and i, Wif(t) indicates an interweight 
from a neighboring R-node / (of j) to R-node i, and Awij{t+1) is the change in 
weight from j to i at time t+1. Note that L and K are positive constants, and 
0 £; is the activation of the E-node. As a mechanism for generating change and 
integrating the changes into the whole system, we use evolutionary algorithm to 
determine the parameters in the above learning rule and structure of intermodule 
connections. 

Three kinds of information should be encoded in the genotype representation: 
the structure of intermodule connection, the number of nodes in each module, 
and the parameters of learning and activation rules. The intermodule weights 
are determined by the Hebb rule mentioned at the previous section. In order to 
represent the information appropriately, a tree-like structure has been adopted. 
An arc in a tree expresses an intermodule connection, and each node represents a 
specific module and the number of nodes therein. For more detailed description 
on the evolutionary modular neural networks, see the recent publication [^. 

3 Grammatical Development of MNN 

Aristid Lindenmayer introduced a formalism for simulating the development of 
multicellular organisms, subsequently named L-systems |7], and the vigorous 
development of the mathematical theory was followed by its applications to the 
modeling of plants. L-systems are sets of rules and symbols that model growth 




= d + Wfj,^aE, 



( 3 ) 
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processes, and there are several variants depending on the properties. This paper 
adopts one of them, called context-sensitive parametric L-system. 

Parametric L-system operates on parametric words, which are strings con- 
sisting of letters with associated parameters. The letters belong to an alphabet 
V, and the parameters belong to the set of real numbers R. A string with letter 
A € V and parameters ai,a 2 ,...,a„ G R is denoted by A(ai, 02 , . . . , a„). A 
formal definition of the context-sensitive parametric L-system is as follows: 

Definition 1. A parametric L-system is defined as an ordered quadruple G = 
(V,S,w,P), where 

— V is the alphabet of the system, 

— E is the set of formal parameters, 

— to G {V X R is a nonempty parametric word called axiom, 

— P G {V X E ) X C{E) X (P X E{E) ) is a finite set of productions, where 
C{E) and E{E) denote logical and arithmetic expressions with parameters 
from E, respectiuely. 

The symbols and are used to separate the three components of a produc- 
tion: predecessor, condition, and successor. Thus, a production has the format 
of Ic < pred > rc : cond succ. 

For example, a production with predecessor B(y), left context A{x), right 
context C{z), condition x -\- y -\- z >= 10 and successor U{x -\- y)V{y -\- z) is 
written as 

A(x) < B{y) > C{z) : X -\- y -\- z >= 10 ^ C/(a; -I- y)V{y -\- z). (4) 

The left context is separated from the predecessor by the symbol <, and the 
predecessor is separated from the right context by the symbol >. This production 
can be applied to the i?(4) that appears in a parametric word •••A(3) B{A) 
C(5) • • •, and replaces B(4) with U{7)V{9). 

With this formalism, a basic element of the L-system can be defined as a 
module or a functional group composed of several modules for modular neural 
networks. A module is denoted as A(x, y) where A identifies the name, x repre- 
sents the number of nodes and y means the connection pointer of the module, 
respectively. Consecutive symbols for modules mean a default forward connec- 
tion from the former module to the latter module. Positive integer of y means 
the forward connection and negative one does the backward connection. The 
functional group is represented by a pair of and ‘]’. One more addition is a 
special symbol which is used to represent disconnection between two mod- 
ules. Fig. El shows some of the typical examples of the grammar and structure 
generated by it. 

In order to see how the grammar generates various network structures, as- 
sume that we have the following definition of an L-system for modular neural 
networks. 
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Fig. 2. Some of the typical examples of the network structure generated by the gram- 
mar. (a) AB, (b) A[B,C]D, (c) A{x,l)BC. 



— Alphabet V = {A,B,C,D,‘, }, 

— Axiom u! = A(100, 0), 

— Productions: 

A(x, y) ^ A{x, l)[B{x/10 - 2, y)B{x/10 + 2, -l)]C(x/10, -1) (5) 

B{xi,yi) < B{x, y) > C{x 2 , y 2 ) ■ x > 10 ^ C{x/2, y)C{xl2, y - 1) (6) 

B{xi,yi) < C{x, y) > C{x 2 , y 2 ) ■ x > b ^ [D{x, y),D{x, 0)] (7) 

The sequence of strings generated by the parametric L-system specified as 
above is like this: 

A(100,0) ^ A(100, 1)[B(8,0)B(12,-1)]C'(10,-1) (8) 

^ A(100, 1)[B(8,0)C(6,-1)C(6,-2)]C'(10,-1) (9) 



^ A(100,l)[i3(8,0)[Z4(6,-l),i?(6,0)]C(6,-2)]C(10,-l) (10) 

Fig.[3]shows the modular neural networks corresponding to each string generated 
by the parametric L-system. 



4 Simulation Results 

In order to confirm the potential of the proposed model, we have used the hand- 
written digit database of Concordia University of Canada, which consists of 
6000 unconstrained digits originally collected from dead letter envelopes by the 
U.S. Postal Services at different locations in the U.S. The size of a pattern was 
normalized by fitting to coarse 10x10 grids over each digit. The proportion of 
blackness in each square of the grid provided 100 continuous activation values for 
each pattern. Network architectures generated by the evolutionary mechanism 
were trained with 300 patterns. A fitness value was assigned to a solution by test- 
ing the performance of a trained network with the 300 training digits, and the 
recognition performance was tested on the other 300 digits. Initial population 
consisted of 50 neural networks of having random connections. Each network 
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(c) 



Fig. 3. The sequence of modular neural networks generated by the parametric L- 
system specified in the text, (a) T(100, 1) [-8(8,0)5(12,—!)] C(10, — 1), (b) T(100, 1) 
[8(8,0) C7(6,-l)C(6,-2)[ C(10,-l), (c) T(100, 1) [8(8,0) [8(6, -1), 8(6, 0)] 

8(6, -2)] 8(10, -1). 




Grammatical Development of Evolutionary Modular Neural Networks 



419 



contains one input module of size 100, one output module of size 10, and differ- 
ent number of hidden modules. Every module can be connected to every other 
module. 

From the simulation, we can see that the evolution led to the increase of 
complexity, and new structures as well as new functionality emerged in the course 
of evolution: In general, the early networks have simple structures. In the early 
stages of the evolution some complicated architectures also emerged, but they 
were disappeared as the search of the optimal solution matured. The earlier good 
specific solutions probably overfitted some of the peculiar training set with lack 
of generality. 

In the test of generalization capability, for the patterns that are similar to 
the trained, the network produced the direct activation through a specific path- 
way. On the contrary, the network oscillated among several pathways to make 
a consensus for the strange patterns. The basic processing pathways in this 
case complemented each other to result in an improved overall categorization. 
Furthermore, the recurrent connections utilized bottom-up and top-down infor- 
mation that interactively influenced categorization at both directions. 

In order to illustrate the effectiveness of the model proposed, a comparison 
with traditional neural network has been conducted. Multilayer perceptron has 
been selected as a traditional neural network, because it is well known as a 
powerful pattern recognizer. The network is constructed as 100x20x10, where 
the number of hidden nodes, 20, has been determined after several trial-and- 
errors. Table Hreports the recognition rates of the two methods over ten different 
sets of the data. As can be seen, the average recognition rate for the proposed 
method is higher than that for the multilayer perceptron. Furthermore, with 
the paired f-test, for all the cases “no-improvement” hypothesis is rejected at a 
0.5% level of significance. This is a strong evidence that the proposed method is 
superior to the alternative method. 



Table 1. Comparison of the proposed method with a traditional neural network, 
multilayer perceptron (MLP), for ten different data sets. 





Proposed Method 


Multilayer Perceptron 


1 


97.67 


95.67 


2 


97.67 


96.33 


3 


97.33 


94.67 


4 


98.00 


96.67 


5 


96.33 


93.67 


6 


97.00 


94.67 


7 


98.00 


96.00 


8 


97.00 


95.67 


9 


96.67 


94.67 


10 


97.67 


95.33 


Mean 


97.33 


95.33 


S.D. 


0.57 


0.92 
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5 Concluding Remarks 

We have described a preliminary design of the modular neural networks devel- 
oped by evolutionary algorithm and a parametric L-system. It has a modular 
structure with intramodular competition, and intermodular excitatory connec- 
tions. We hope that this method can give the modular neural network the seal- 
ability in complex problems, similarly to the result of [^. This sort of network 
will also take an important part in several engineering tasks exhibiting adaptive 
behaviors. We are attempting to make the evolutionary mechanism sophisticated 
by incorporating the concept of co-evolution. 
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Abstract. Optimization problems such as system reliability design and 
general assignment problem are generally formulated as a nonlinear inte- 
ger programming (NIP) problem. Generally, we transform the nonlinear 
integer programming problem into a linear programming one in order to 
solve NIP problems. However linear programming problems transformed 
from NIP problems become a large-scale problem. In principal, it is de- 
sired that we deal with the NIP problems without any transformation. 
In this paper, we propose a new method in which a neural network tech- 
nique is hybridized with genetic algorithms for solving nonlinear integer 
programming problems. The hybrid GA is employed the simpelx search 
method, and the chromosomes are improved to good points by using the 
simplex search method. The effectiveness and efficiency of this approach 
are shown with numerical simulations from the reliability optimal design 
problem. 



1 Introduction 

Neural network (NN) technique is receiving much attention and applied for a 
variety of optimization problems m-m- The advantages of neural network tech- 
nique lie mainly in that the computation is completed in massively parallel 
architectures and that optimal solutions renewed parameters are adaptively ob- 
tained as new equilibrium points under the new environment. However, when 
we apply neural network techniques for a solving nonlinear integer programming 
problem, it is difficult to obtain integer solutions. 

To solve this problem effectively, we introduced genetic algorithms (GAs) 
which are very powerful tools for solving such the nonlinear optimization prob- 
lems and can handle any kind of nonlinear objective function and constraints 
00. The method for solving a nonlinear integer programming problem using the 
neural network technique and the genetic algorithm (NIP/NN-GA) was recently 
proposed by Gen et al. 0. The NIP/NN-GA method is used Neural Network 
and GA method to obtain the best solution. In this method. Neural Network 
is used for finding initial soulutions of the GA. However, if we deal with the 
large-scale problems, the NIP/NN-GA method has many combination for the 
solution. In this paper, we propose the new method in which the neural network 
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technique is hybridized with the genetic algorithm combined with the simplex 
search method for solving NIP problems. Simplex search method is one of the 
direct search methods, and makes calculations very time-consuming and great 
accuracy in the final solution is desired. And the effectiveness and efficiency of 
this approach is shown with numerical simulations from the large-scale problem 
in which the proposed method is obtained the best solutions faster than the 
NIP/NN-GA method. 

2 Nonlinear Integer Programming Model 

The NIP problem which maximizes a nonlinear objective function f{x) subject 
to m constraints and n decision variables, is formulated as follows: 

NIP : 

max f{x) (1) 

s. t. gi{x)<bi, i = l,2, •••,TO (2) 

X = [xiX2‘ ■ ■ Xn] , Xj >0, j = 1,2, ■ ■ ■ ,n : integer 

where gi{x) is the zth inequality constraint, bi is right-hand side constant of the 
Ah constraint and Xj is the jth decision variable which takes integer value. 



3 Methods for solving NIP problem 
3.1 NP/NN method 



We consider that the NIP problem has no integer restrictions, because the neural 
network technique is an approximate method suitable for a continuous values, 
i.e., we solve the NP problem [^. We can easily transform the above to the 
minimization prolbem by multiplying objective function by -1. 

We can construct the energy function based on the penalty function method 
for solving the NP problem. The penalty function method transforms the con- 
strained optimization problem into the unconstrained optimization one. In order 
to solve the NP problem, we construct the following energy function [2]: 



E{x,k) = - fix) + 



- 9rix)]Y + 



( 3 ) 



i=l 






where k > 0 is penalty parameter, \bi — gi{x)]_ — min{0, b^ — gi{x)} and [xj\- = 
min{0,a::j}. Minimizing the energy function m leads to the system of ordinary 
differential equations as follows: 

H 'V 

^ = -^,^,E{x,k) (4) 

where /r > 0 is called learning parameter. 
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3.2 Genetic Algorithms 

Representation and Initialization Let Xk denote the /cth chromosome in a 
population as follows: 

Xk = [xki ■■ - Xkj - ■■ Xkn], fc = 1, 2, • • • , pop-size 

where pop size means population size. The initial integer solution vectors are 
randomly created within the region of the real-valued solutions obtained by the 
neural network technique. The revised width is a parameter to limit the range 
for creating the initial solution vectors. 



Evaluation We evaluate the original objective function of the NIP problem as 
follows: 



eval(a3fc) = | 



9r{Xk) < bi 

otherwise 



(i = 



where M is a positive large integer as a penalty in the case of violating the 
constraints. 



Genetic Operators 

(l)Crossover: For each pair of parents Xi and X 2 , the crossover operator will 
produce two children x and x as follows: 

X = [ciXi + C2X2 + 0.5J , X = [c2Xi + C1X2 + 0.5J 



where, Xi,X 2 > 0 and Ci -I- C 2 = 1. 

(2) Mutation: We set the revised width, and then select a mutating position 
at random. Finally, we exchange the selected value for the revised value. 

(3) Selection: We select the better chromosomes among parent and offspring 
by evaluation value. The number to be selected is popsize and let these 
chromosomes enter the next generation. Duplicate selection is prohibited. 

3.3 Simplex Search Method 

The implementation of simplex search algorithm requires only two types of cal- 
culations: (l)generation of a regular simplex given a base point and appropriate 
scale factor, and (2) calculation of the reflected point. The first of these calcu- 
lations is readily carried out, since it can be shown from elementary geometry 
that given an n-dimensional starting or base points x^^'^ and a scale factor a, 
the other n vertices of the simplex in n dimensions are given by 

xf'' = xf'’ -I- if * = j 

= xf"'+52 ifiyfj, for i, j = 1, 2, ..., n 



( 5 ) 

(6) 
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where, 



(n + 1)^/^ -1- n — 1 




(n -1- 1)^/^ — 1 


n\/2 


, 02 — OL 


n\[2 



<5i = 



The choice a = 1 leads to a regular simplex with sides of unit length. The second 
calculation, reflection through the centroid, is equally straightforward. Suppose 
is the point to be reflected. Then the centroid of the remaining n points is 



E n 

_ i=0,j=i 



xy = 



j = l,2,...,n 



(7) 



All points on the line from through are given by 



X = x^P^ 



A(x 



(c) _ 



(p)) 



Here, A = 2 will yield the desired new vertex point. Thus, 

X = — x^P^ + 0.5J 



(8) 

(9) 



4 Proposed Algorithms for solving NIP problem 

Now, we show the overall procedure for solving the NIP problem as follows: 

Step 1 : Set learning parameter /x, penalty parameter k, the initial solutions 
cCj(O), the step size rj and permissive error e. 

Step 2 : Initial search by the neural network technique: 

2-1: Construct the energy function E{x, k) for solving the NP problem. 

2-2: Construct the system of ordinary differential equations from E{x, k) 

and then solve it by using Runge-Kutta method. 

2-3: If \xj{t + rj) — Xj{t)\ < e, round off to initial solutions and go to Step 

3. 

Step 3: Set population size popsize, crossover rate Pc, mutation rate Pm, max- 
imum generation maxgen and the revised width rev. 

Step 4: Optimal search by the genetic algorithm: 

4-1: Generate the initial population 

Decide the range of the decision variables which we round to a decimal 
point and the revised width, and then, generate the initial populations. 
4-2: Evaluation 

Calculate the evaluation function. 

4-3: Genetic Operations 

4-3-1: Crossover (arithmetical crossover) 

4-3-2: Mutation (one-point mutation) 

4-3-3: Selection (elitist selection) 

4-4: Reorganization of population 

4-4-1: Generate a regular simplex for each selected chromosome by 

if i = j then = x^^^ + Si, else = x^^^ -I- <52 for i,j = 1, 2, ..., n. 
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4-4-2: Calculate the centroid vector by which each chromosome is re- 
flected. Centroid of the remaining n points is 



(c) 

= 



V" X 



(0 



j = 



4-4-3: Reflect all genes through centorid 

X = [2a;(^) - £c(p) -b 0.5J 

where, x^p^ = argmax{eval/(a;*^®)) | i = 0, 1, ...,n} 



4-4-4: If the evaluation value of the reflected chromosome get better 
than those of the selected chromosomes from Step 4-3-3, put it into the 
population of the next generation. 

4-5: Termination condition 

If the generation is equal to number of maximum generation, then go to 
Step 5. Otherwise, go to Step 4-2. 

Step 5: Output the solution. 



5 Numerical Examples 

In this section, numerical examples as a NIP problem are solved by the proposed 
method and we make comparative study for the simple GA, NIP /NN-GA method 
and proposed method. 

5.1 Example 1: 

We consider the following NIP problem with 5 decision variables and 3 con- 
straints [HI: 

max f{x) = ]^{1 - (1 - RjT^} 
i=i 

5 

s. t. < 100 

f=i 

5 

+ exp(a:j/4)} < 175 

f=i 

5 

’y^ WjXjexp{xj/4:) < 200, Xj>l, j = l,---,5 : integer 
f=i 

where the coefficients for this problem is shown in Table [I] 

Table 1. Coefficients for Example 1 



j 


1 


2 


3 


4 5 


Rj 


0.80 0.85 0.90 0.65 0.75 


Pj 


1 


2 


3 


4 2 


Cj 


7 


7 


5 


9 4 


Wj 


7 


8 


8 


6 9 
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We relax the NIP problem into the NP problem to apply neural network 
technique and construct the energy function from it. According to the gradient 
method for the energy function E{x^k), we can transform the system of ordinary 
differential equations. When the initial values and the parameters for the initial 
search are /x = 1000, k = 5000, = . . . = = \^ rj = 0.01, e = 0.001, 

the search result in X\ = 2.5641, X 2 = 2.3410, ^3 = 2.19344, Xi = 3.1850, x^ = 
2.77524. Next, we create the initial population based on the obtained solutions 
and set revised width for GA. We set the parameters of the genetic algorithm as 



Table 2. Simulation Results for Example 1 with 20 times performance 





Simple GA 


NIP/NN-GA 


Proposed method 


best 


0.874107 


0.904489 


0.904514 


worst 


0.704481 


0.890721 


0.894232 


average 


0.783321 


0.900031 


0.901274 




0 30 60 90 120 150 

Generation Number 




0 30 60 SO 120 150 

Generation Number 



Fig. 1. Convergence of the proposed 
method and the Simple GA mehtod for 
Example 1 



Fig. 2. Gonvergence of the proposed 
method and the NIP/NN-GA method 
for Example 1 



follows: the population size is 20, crossover rate is 0.5, mutation rate is 0.2, the 
maximum generation is 150 and revised width is 3. The optimal solution for this 
problem is [xi X 2 ■ ■ ■ 3 : 5 ] =[3 2 2 3 3] and the objective function is 0.904514. 

Figure[T]and[2]show the convergence process of the proposed method, NIP/NN- 
GA method and the simple GA run by 20 times. In Fig.|T]and Fig. [21 the proposed 
method and the NIP/NN-GA method are obtained the solutions better than the 
simple GA. And then, the proposed method(NIP/NN-hGA) is obtained the best 
solutions faster than the NIP/NN-GA method shown in Fig.|2l 



Hybridized Neural Network and Genetic Algorithms 427 



5.2 Example 2: 

Now, we have another NIP problem with 15 decision variables and 2 constraints: 

15 

max f{x) = Y[{1- {1- Rjfq 

i=i 

15 

s. t. ^ CjXj < 400 
i=i 

15 

WjXj < 414, Xj >1, j = 1, 2, • • • , 15 : integer 



Table 3. Coefficients for Example 2 



j 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


Rj 


0.90 


0.75 


0.65 


0.80 


0.85 


0.93 


0.78 


0.66 


0.78 


0.91 


0.79 


0.77 


0.67 


0.79 


0.67 


Cj 


5 


4 


9 


7 


7 


5 


6 


9 


4 


5 


6 


7 


9 


8 


6 


Wj 


8 


9 


6 


7 


8 


8 


9 


6 


7 


8 


9 


7 


6 


5 


7 



Table 4. Simulation Results for Example 2 with 20 times performance 





Simple GA 


NIP/NN-GA 


Proposed method 


best 


0.92023 


0.94471 


0.944819 


worst 


0.84744 


0.93296 


0.94383 


average 


0.89305 


0.94432 


0.94450 



where the coefficients for this problem is shown in Table El By the same 
way, we relax the NIP problem into the NP problem to apply neural network 
technique. After constructing the energy function and its ordinary differential 
equations, we can obtain the following solutions to use initial values of the hy- 
bridized GA with the neural network technique: 

xi = 2.717, X 2 = 3.707, 0:3 = 4.367, X 4 = 3.385, X 5 = 3.059 

xq = 2.497, xr = 3.514, xs = 4.299, xg = 3.514, x\g = 2.645 

xii = 3.450, X 12 = 3.578, 0:13 = 4.232, xu = 3.450, X 15 = 4.232 

We set the parameters of the genetic algorithm as follows: the population size 
is 20, crossover rate is 0.4, mutation rate is 0.3, the maximum generation is 2000 
and revised width is 2. The optimal solution for this problem is [xi X 2 ■ ■ ■ a;i 5]=[3 
4533245433455 5] and the objective function is 0.944819 while 
these values are same in 1^. 
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0 ax ra BOO (om 

Generation Number 

Fig. 3. Convergence of the proposed 
method and the Simple GA mehtod for 
Example 2 




0 100 200 300 400 500 

Generation Number 

Fig. 4. Gonvergence of the proposed 
method and the NIP/NN-GA method 
for Example 2 



6 Conclusion 

In this paper, we proposed the new method in which neural network technique 
was hybridized with the genetic algorithm for a solving nonlinear integer pro- 
gramming problem. The hybrid GA employs the simplex search method thereby 
incorporating a local search mechanism to complement the global search capa- 
bilities of traditional GAs. In the results of the simulation, the proposed method 
is obvious that the chromosome is improved as every generation. And then, the 
NIP/NN-hGA method proposed results in one of practical tools to solve the 
nonlinear integer programming problems. 
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Abstract. A new model for the incorporation of learning with simu- 
lated evolution is presented. The model uses gene coordination networks 
to control gene expression. Alleles at a locus compete for expression by 
matching up to the network. Reinforcement is achieved through choice 
dynamics where gene expression will be decided by competing environ- 
mental states. The result is a epistasis model containing both plastic- 
ity and mean loci. Solutions obtained are adaptive in the sense that 
any changes in the environment will bring about a spontaneous self- 
organization in the pattern of gene expression resulting in a solution 
with (near) equivalent fitness. Additionally the model makes the search 
for structures through neutral or near neutral mutation possible. The 
model is tested on two standard job-shop scheduling problems which 
demonstrate the novelty of the approach. 



1 Introduction 

The paper discusses an evolutionary algorithm for adaptive search and opti- 
mization. The algorithm evolves plastic solutions capable of immediate self- 
organization in the event of an environmental change. If a gene is deleted other 
genes will alter their expression so that a solution with (near) equivalent fit- 
ness is obtained. This is accomplished through local gene interaction networks 
that coordinate gene expression. Genes regulate each other’s activity directly 
or through their products via these networks. Here the gene coordination net- 
works are modelled by simple feed- forward neural networks. An analogy is drawn 
between the neural network and a network of interactions among information 
macromolecules responsible for gene coordination (Zuckerkandl, |1997| ). 

There are two reasons why we should be interested in a system of this type. 
The first is its role in the search for structures. Due to the plastic nature of 
individuals, mutations may have little or no influence on their fitness. Neutral 
mutations like these could play an important role in search through random 
drift due to finite population numbers (Kimura, [1 968j ). Secondly adaptive so- 
lutions may be desirable for critical applications where sudden changes in the 
environment must be met with a compromise solution immediately. 

The next section describes in detail the gene coordination network and how 
it regulates gene expression. Section 0 discusses how this network may be in- 
corporated in an evolutionary algorithm which then is used in the simulation 
examples in section |4] on two standard job-shop scheduling benchmarks. The 
paper concludes with a discussion. 

X. Yao et al. (Eds.): SEAL’98, LNCS 1585, pp. 430- |l37| 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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2 Gene Coordination 

The gene coordination network’s task is to determine which gene is to be ex- 
pressed as a function of the environmental state of the genome. As genes are 
expressed their products change the environment. Through the environment or 
directly genes are capable of regulating the activity of other genes. There is no 
predetermined environmental problem for which there is a solution, the genome 
constructs the environment and h ence determines both the solution and problem 
simultaneously (Lewontin, |1 982] ). The construction and reconstruction of their 
environments is, however, constrained by what they already are. The genome 
alters its environment based on patterns of the world which are presented to the 
gene coordination network. The network consists of nodes and connections. The 
nodes are simple processing units whose activation is the weighted sum of their 
input from the environment and from other nodes. Knowledge is stored in the 
connections among the processing units and learning takes place through the 
adjustment of their connection strengths. 

Each environmental state and corresponding response could be considered in 
isolation by the network if an absolute value judgement were given. The response 
strength, gene activity, would then be an intervening variable reinforced by some 
function of the individuals fitness. In essence the network would be attempting 
to predict the resulting individual’s fitness based on the current environmental 
state and actions taken. Formulating reinforcement in isolation is, however, not 
a simple procedure. It is believed that choice procedures may provide a better 
measure of the effects of reinforcement. The measures of ‘absolute values’ are 
just a result of choice dynamics (Williams, [1 QQ4j ). This is the approach taken 
here, where gene expression is the result of choices made from a set of competing 
environmental states. 

In fig. [Da section of the genome model, depicted as a genetic string, is illus- 
trated. Two different loci {I and m) are shown. Each locus is occupied by alter- 
native forms of a gene which are known as alleles of one another. An individual is 
imagined to contain multiple alleles. In general, however, living organisms have 
only two alleles although a greater variety exists within the population. In our 



direction of transcription 




Fig. 1. Genome model. Competing alleles at loci I and m where (*) marks the currently 
successful allele. 
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model multiple alleles will compete for the right to be expressed but only two of 
them at a time. The competition is resolved by the gene coordination network 
which is modelled by a feed- forward neural network. As for any connectionist 
model we must make assumptions about the number of nodes, arrangement of 
connectivity and their interaction with the environment. Since the network is 
a competitive or choice network the input will be the difference between two 
competing environmental states associated with the alleles. An array of environ- 
ment state differences at locus I is denoted by = £] — £j, where allele 

i is competing with allele j. The environmental differences are connected to a 
hidden layer of nodes which are connected to two output nodes as shown in the 
figure. The activations of the two output nodes Oihs and Orhs are real numbers 
between 0 and 1. The node with the higher activity wins. Having two output 
nodes allows us to instruct the network when the two choices are equivalent. For 
each competition performed two separate inquiries are made as to whether an 
allele should be chosen over the currently successful one. The results must be 
consistent and if the challenge of allele j is to be successful over allele i then: 
^ihs ^ ^riis ^ihs — ^rhs niust be Satisfied where A£^’^ = —A£^'''. If the 
above inequality does not hold then allele i remains successful. With no useful 
information available from the environment the network may respond in a con- 
tradictory manner and the successful gene will hold its position independently 
of changes in the environment. To achieve this the model must remember which 
allele was expressed previously at that locus. Loci which are sensitive to the 
environment are known as plasticity loci and those insensitive to the environ- 
ment mean loci. A genome containing only plasticity loci has been labelled the 
pleiotropy model by Scheiner ( |1998j ). The pleiotropy model is a special case of 
the epistasis model which contains also mean loci. 

Two types of perturbations are possible at a given locus. The first is a min- 
imal perturbation where a previously expressed allele is forgotten and therefore 
will not hold through to the next generation. If the locus is a plasticity locus 
then it is likely that the allele will regain its position. If, however, the locus is 
a mean locus the allele will loose its position. The second is a structural per- 
turbation where the gene coordination network itself is modified. This may be a 
modification of the network architecture or the connection strengths. Viewed in 
this manner a structural perturbation may constitute learning. The names for 
these perturbations are taken from Kauffman ( [1 991 1 1. Additionally, a previously 
expressed allele and/or other alleles may be deleted (removed from a locus) and 
others added. 



3 Evolving Gene Coordination 

The genome is represented by a string containing M loci as shown in fig. [T] 
The string is transcribed systematically from left to right processing one locus 
at a time. Within each locus there exist m alleles. Random competitions are 
held where alleles compete with the currently successful one for its right to be 
expressed. The competitions continue until a maximum number is reached or 
a time r has elapsed. The competitions are decided by the gene coordination 
network as discussed in the previous section. Each locus will be represented by 
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a data structure containing a neural network for gene regulation and a list of 
competing alleles. The data structure may also hold information about which 
allele was previously expressed, training history for the network, etc. In the model 
presented here the previously expressed allele will be remembered and in the next 
generation be the default successful allele. If, however, this allele happens to be 
illegal in the following generation, a random legal allele is selected as the default 
which then other alleles must compete with. 

There are a number of possible methods for evolving the connection strengths 
of the gene coordination networks. In this paper the networks will be trained 
using supervised learning with backpropagation. Training data for learning is 
sampled from the current population. During transcription the environmental 
states associated with the expressed alleles are recorded in the loci data structure. 
Once the genome has been expressed completely, its total fitness will be known. 
From a population of N unique individuals a training set of the size N x {N — 1) 
can be sampled. Should there be any useful information in this data the network 
may learn it and hopefully generalize this knowledge. 

New individuals may be formed using standard recombination operators. Loci 
may be exchanged between two or more individuals using one, two or multiple 
crossover sites. Mutation will play an important role in maintaining a diverse 
environment. Minimal perturbations will attempt to knock out successful alleles. 
It is expected that minimal perturbation will have less influence on plasticity loci. 
Structural perturbation will randomly reset the connection strengths for the gene 
coordination networks and will permanently damage loci. It is also possible to 
view the training of a network as a more complex structural perturbation. If the 
new networks perform well, regardless of whether the training data used was 
sensible, we expect it to be selected for. The evolutionary algorithm used in the 
following section for our simulations may be summarized as follows: 

1. Initialize population and networks randomly. 

2. Loop through the following steps until the termination criteria is met: 

(a) Transcribe loci by performing m random competitions at each locus with the 
successful allele. Record allele transcribed and corresponding environmental 
state. Compute individual’s htness and store elite individual. 

(b) Select individuals from the current population using tournament selection to 
form the new population for the next generation. 

(c) Train gene coordination networks in the new population at loci which have not 
been trained before with a probability Pi. The Pi parameter will essentially 
dictate the initial rate of learning. Training samples are taken from the old 
population. When a network has been trained a training flag T for that locus 
is set to false. 

(d) If the elite has been lost inject a copy into the new population (elitist). 

(e) Shuffle new population and perform a two point crossover in order to exchange 
loci between selected individuals. The probability of crossover is P^. 

(f) Perform a minimal perturbation with probability Pm by exchanging the cur- 
rently successful allele at a locus by another randomly chosen allele. Perform a 
structural perturbation with probability Ps by randomly resetting the connec- 
tion strengths for a network at a random locus. In both cases set the training 
flag T to true. 
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4 Computational Results 



In this section the evolutionary algorithm described in the previous section will 
be tested on two well studied job-shop scheduling problems. The problem is an 
NP hard optimization problem and has been extensively studied. There exist 
over a hundred different rules for building job schedules and so it is interesting 
to observe what type of rules emerge in the networks. The redundant nature of 
the problem also makes it an interesting test case. The goal is to assign jobs 
on machines such that the overall production time, the makespan, is minimal. 
The order by which a job may be processed on the machines is predetermined. 
Schedules are formed by systematically assigning one job after the other at its 
earliest convenience. In our experiments each allele denotes a unique job. So for 
a problem with nj jobs and rim machines there will be rij alleles competing at 
each locus in the string of length Uj x rim- Alleles corresponding to jobs that 
have been completed are illegal and will not compete for expression. 

The test problems taken from Muth and Thompson (' |l9B3j l are of sizes 6x6 
and 10 X 10. The optimal makespans are known and are 55 and 930 respectively. 
As a schedule is being constructed a number of features of the solution become 
available. These are the environment states which may be associated with a job 
(allele). For the simulation performed three environment states are used: the 
time a job is available, the time it may be expected to finish and the total time 
still needed to complete the entire task (work remaining). These environment 
states are used as input data for the gene coordination network which has one 
hidden layer with 6 nodes. For the training of the network the output pattern 
used is fi < fj for the left output node and fi > fj for the right output node, 
where / is the global fitness value. Note that these are Boolean operators and 
that the problem is one of minimization. A sample size, twice the size of the 
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Fig. 2. Perturbation results for the 6x6 (left) and 10 x 10 (right) problems. The top 
fignre shows the fitness at a locus which has had its successful allele deleted and all 
loci to the right of it have been structurally perturbed. The bottom figure shows the 
fitness at a locus which has only had its successful allele deleted at that point. 
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population, is extracted as discussed in the previous section. Samples for which 
A£ = 0 are ignored. 

The training algorithm used is the gradient decent backpropagation with mo- 
mentum and adaptive learning rate (Demuth and Beale, fT^ ). The log-sigmoid 
transfer function returns the activations of the nodes squashed between 0 and 1. 
A network is trained for 100 epochs and if it survives it may be trained further in 
some future generations. A population size of 30 is used for the 6x6 problem and 
50 for the 10 x 10 problem. These are small population sizes, especially for the 
larger problem, but are sufficient for our purposes. The probability for crossover 
is Pc — 1, for learning P; = 0.2 and for minimal perturbations Pm=l/(string- 
length) . The probability of structural perturbation for the smaller problem was 
none. For the larger problem it was found to be useful to add a very slight chance 
(0.01%) of a structural perturbation and an increase in minimal perturbations. 
Thirty independent runs were taken for each problem. For the 6x6 problem 
the optimal solution was found within 40 generations. The larger problem was 
stopped after 200 generations since the solutions have essentially converged. The 
results varied from a makespan of 960 to 990. 

Results for a typical solution found for the two problems will be presented. To 
test the plasticity of the solutions found all loci are systematically perturbed by 
deleting the successful allele and putting another in its place. This can be done 
in m — 1 different ways at each locus. The result of this is that on average 50% 
of the loci are immune to the perturbation for the 6x6 problem. Either other 
loci performed its function or another phenotype was produced which gave the 
same global fitness value. Fig. [H (left) shows six different solutions resulting from 




Fig. 3. Gantt charts (left) and network landscapes (right). The left shows six different 
solutions obtained due to perturbation by deletion for the 6x6 job-shop problem’s 
elite solution. The right the choice landscapes for the gene coordination networks per 
locus. The locus number is given above each map with the number of time the network 
has been trained during its evolution over the generations in parenthesis. 
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these perturbations. The bottleneck remains on the same machine but some job 
orders have changed. The means by which a solution is constructed as well as 
the problem itself are redundant. The bottom plot in fig. 0 shows which loci are 
most immune to the perturbation by deletion. Regions that are in the start of 
the string are more susceptible to damage. This is reasonable since they must 
essentially predict much further into the future. To eliminate the possibility that 
this is a result of some other factors, such as the constraining of the solution 
space, all networks to the right of the damaged locus were structurally perturbed. 
The result of this is shown in the top plot in fig. 0and illustrates how the fate 
of the last m loci is determined independent of the network when M — m loci 
have been expressed. 

Fig. 12] (right) shows the choice landscape for the 6x6 problem where the 
difference in the work remaining has been set to zero. The horizontal axis is the 
difference in time of completion and the vertical axis when a job is available. On 
both axis the scale is from —50 to 50. The landscape is the difference between 
the two output nodes, Oihs — C’rhs- The darker regions are positive values whereas 
the lighter are negative. The network for example at locus 24 will prefer a job 
(allele) with a sooner completion time and later availability, but at locus 34 early 
completion time is preferred regardless of availability. In general scheduling jobs 
with earlier completion times are preferred. Some of the nets are contradictory 
which will make their corresponding loci a mean loci. Examples of these are the 
first and the last locus. It is understandable that the last locus could be a mean 
locus since its fate is always decided. The first loci has one environment state 
always equal to zero. When we examine the choice landscapes also with respect 
to the work remaining, we find that most loci are of the plasticity type. 

The same perturbations by deletion were performed on a 10 x 10 solution. The 
result was that on average 60% of the loci were immune to deletion. The results 
are depicted in fig. l2l (right). When an arbitrary gene is deleted, how many genes 
alter their expression pattern? The single perturbation brings about a cascade of 
change in the patterns of gene activation. If the mutations are neutral the result- 
ing solution - the phenotype - remains the same or the phenotype has changed 
but its fitness remains unchanged. In fig. 0 the number of genes which alter 
their expression is plotted against the locus where the deletion occurred. Only 
the cases where the equivalent elite fitness was obtained is shown. Commonly it 
suffices that just one additional gene changes its expression to compensate for 
the damage. Changes for up to 19% of the string are also observed. 




locus 



Fig. 4. Expression changes. The figure shows how many genes have changes their ex- 
pression as a function of locus where the successful allele was deleted. 
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5 Discussion 

In this paper we have presented a general epistasis model for adaptive search 
and optimization. The development of the model is like creating a language 
whose rules have not been formalized and where there is no priori definition 
of ‘purpose’ or ‘sense’. As the gene coordination networks evolve their meaning 
develops alongside or with it. Gene expression is controlled by these networks, 
where two alleles are matched up at a time. This is a contingent process. 

It is commonly regarded that in genetic systems information storage lies in 
the gene frequency within the population. A more efficient means of storing 
knowledge can, however, be achieved through biological networks. Formulating 
gene expression as an ‘intelligent’ process introduces new possibilities for the 
role of learning in difficult search problems and introduces naturally problem 
domain knowledge to the evolutionary algorithm. 

Further studies of the effects of different learning procedures and learning 
rate is currently being investigated. The importance of neutral and nearly neu- 
tral mutations as pathways toward new structures also needs further studies. 
Preliminary results suggest that an increase in the rate of minimal perturbation 
may be beneficial in this case. Additionally, the effect of adding perturbation by 
deletion during evolution (a removal of competing alleles), which will produce 
individuals with a varying number alleles, would be a natural extension of the 
evolutionary approach presented here. 
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Abstract. We present an approach to adaptive simulation based upon 
a stratified representation of the behaviour of entities. In previous sim- 
ulation models, the adaptation of an entity’s behaviour is defined prior 
to runtime, as the conditions they might encounter are completely spec- 
ified. While entities are custom-made to function properly for particular 
situations, their behaviour could be inappropriate in other situations. 
We propose a behavioural model of adaptation which enables entities to 
modify their behaviour for a large variety of situations, and describe the 
implementation of the model for two simulations in a biological context. 
Application areas range from environmental simulation and ecological 
planning to psychological modelling and navigation simulation. 



1 Introduction 

Designers of computer simulations aim to accurately represent the entities from 
a specific domain within the real-world, and these simulated entities do not need 
to function in other domains. In contrast, our aim is to simulate the capacity of 
biological organisms to dynamically adapt their behaviour to different domains. 
Adaptation is defined as “the process of becoming adjusted to new conditions” 
Cl, and is important to three areas within computer science: artificial life, case- 
based reasoning and intelligent agents. Artificial life simulates the interactive 
behaviours of large populations of organisms, where each entity’s behaviour is 
locally defined in their specification |Z] . However, each entity’s behaviour is usu- 
ally derived from an original set of behaviours, and is thus constrained to operate 
in situations for which the original behaviours were designed. 

Case-Based Reasoning (CBR) simulates adaptation within the memory of 
one individual over time, with the basic unit of memory called a ’case’. Three 
types of adaptation have been investigated: additive, indexing, and case. Additive 
adaptation simply refers to the addition of a new case into memory, and has been 
compared to experiential learning | 14| . Indexing adaptation modifies the index 
structures of memory with the aim of optimising case retrieval efficiency [H]. 
Case adaptation occurs when cases in memory which are similar to an input 
case are modified to find a solution to an input case m- Current CBR research 

X. Yao et al. (Eds.): SEAL’98, LNCS 1585, pp. 438- |44^ 1999. 
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focuses on encapsulating large amounts of expert knowledge about a domain 
within cases for specialised domains such as cooking , and law [1] . CBR thus 
attempts to simulate the problem-solving reasoning process of an expert about 
a particular area of knowledge through the adaptation of information, and is not 
concerned with the adaptation of these individuals’ behaviour in new situations. 

Intelligent Agents researchers, while not in complete agreement about what 
constitutes the notion of agency m, have proposed mentalistic characteristics 
for agents which are useful for considering adaptation with a biological metaphor. 
These mentalistic characteristics have been represented in the specification of 
entities in various explicit and implicit combinations. An explicit mentalistic 
representation (EMR) attempts to model psychological characteristics in the 
simulated organisms. We consider four characteristics only due to their obvious 
connection to adaptation: emotions, rationality, memory, and learning. 

— Emotional states are represented as they can significantly influence the be- 
haviour of individuals. 

— Rationality defines an entity’s behaviour as teleological (directed towards a 
goal). A hierarchy of goals such as Maslow’s Model ITU] might be included 
in a mental model. 

— Memory is required for entities to record past situations. Without it, entities 
are unable to practically observe even a simple sequence of events. Perceptual 
aliasing occurs when an entity’s behaviour depends upon something from the 
past as opposed to the present [U]. 

— Learning capability allows an entity to detect recurring sequences of events 
in their memory which relate to their goals, identify the key events, and act 
on their knowledge to their own advantage. Such learning has been classed 
as medium level j5]. 

An implicit mentalistic representation (IMR) relates an entity’s perceived world 
state to an entity’s action state, similar to machine learning, so psychological 
characteristics are not explicitly represented. It has been argued that it would 
be theoretically possible to create two entities, one with EMR and one with IMR 
adaptation, and be unable to tell the difference between the two by observing 
their behaviour m- However, it has been shown that it would be practically 
and computationally infeasible to represent the responses necessary for IMR for 
all the possible combinations of large numbers of conditions in situations [^. 
Nonetheless it is useful for entities to have black box responses for two cases: 
conditions which are anticipated to occur across a broad range of situations; and 
for conditions which require adaptation so quickly that the time required for an 
EMR to process the input would be prohibitive. 

In our domain we are investigating the simulation of intelligent interactions 
and require the behaviour of entities to be appropriate for any conceivable situ- 
ation. In order to be computationally feasible, all possible situations are struc- 
tured as much as possible into typical situations with changing parameters. Since 
Bakhtin [6] argued that the greatest degree of structure occurs when the world is 
authored within the structure of a story, a simulation can be placed conceptually 
within the context of a story and typical situations can be represented as ab- 
stract components of stories [U] . Our domain requires two modular components: 
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a module capable of authoring situations; and a module capable of representing 
the behaviour of entities with the capacity to adapt intelligently to any authored 
situation. The latter is the particular focus of this paper. 

In Section 2, we discuss our stratified behavioural model of adaptation which 
locates the various research efforts in adaptation into a unified framework. Sec- 
tion 3 details the implementation of this model for two simulations of biology, 
while Section 4 concludes and proposes the future directions of our work. 



2 Our Adaptation Model 

We now describe our stratified representation of behaviour shown in Figure 1., 
and relate previous research on adaptation within our structure. 



Current 

Situation 


Perception 


Entity Behavioural Type 


I Unconditioned Response | 


Action 




j Conditioned Response j 






1 Analogical Response j 



Fig. 1. Conceptual Model of Adaptation 



The Current Situation describes all objects and events that can be observed 
by an entity through its perception. An entity is also aware of its own EMR, 
and other entities will not be aware of these internal states unless they are 
communicated in some manner. Consequently, each entity perceives a slightly 
different version of the same current situation. While an entity may not need to 
continually perceive the state of the world, all entities are in a constant state 
of action. As Bakhtin asserted, the act of doing nothing is the continous act of 
choosing not to do something else [B] . We define behaviour for entities by relating 
what they perceive P of a situation to an act A. The behaviour B of an entity 
over their lifetime is then a set of these tuples where f = 0 is the first moment an 
entity perceived their environment and acted, and t = N is the current situation. 

B = {{Pt=o, At-o), {Pt++, At++), • • • , (P(= — N,At= — n), {Pt=N, At=N)} 

The Entity Behavioural Type models the three different sources of behaviour in 
an entity. Each behavioural type is represented mathematically, and the rela- 
tionship between these are described. 

Traditional simulation creates entities with behaviour which is of the type 
called Unconditioned Response (UR). The behaviour is called unconditioned for 
two reasons: the rules controlling it are generated before t = 0 for an entity; and 
no modifications are made to the set of behaviours during runtime. Note that 
both EMR and IMR can produce unconditioned responses. U R can be described 
mathematically as a set of tuples relating perceived situations P to the actions 
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A which are optimal as they match some given criteria for situations 0 to iV 
which are created before runtime. 



UB = {{Po, ^o), ^i), ■ • • , {Pn-i,Am-i), (P/v, ^iv)} 

Note that if every tuple in B maps to a tuple in UR then the entity will behave 
appropriately for all situations during its lifetime. UR has two advantages for 
entities: they respond in a specified manner every time to a specific situation; and 
there is no computational expense associated with training on the fly. However, 
for an entity to always behave in a correct manner, the complete set of possible 
world states needs to be known before runtime. With the simulation of biological 
organisms, unconditioned responses are always appropriate for the situations 
considered and should have priority over the remaining two response types. 

On the other hand Conditioned Responses {CR) describes behaviours learnt 
during runtime that also produce desirable situations for entities, so (7 i? is ex- 
pressed similarly to U R. Such learning can be likened to situation-based additive 
adaptation in case-based reasoning, since an entity must be able to recall and 
store separate cases within some form of memory structure. Since the memory of 
an entity is finite a maximum size is set for sliding memory window called short 
term memory M$t- Consider an entity with the ability to remember a limited 
number J of situations P and correct responses A which they had perceived 
from t = 0 to the current situation t = N. 

Mst = {{Pt=j,At=j+i), ( Pt^j+i,At=j+2), • • • , 

( Pt=N-2,^t=N-l),{Pt=N-l,^t=N)} 

However, storing all of these tuples for multiple entities is computationally ex- 
pensive, so only the situations which satisfy particular criteria should be remem- 
bered. For our biological context, the criteria is the satisfaction of particular goals 
of entities. We call this Altered memory long term memory M^t, where what 
has been learnt {CR) is related to the goal G that CR achieves. The long term 
memory could be indexed by the relative importance I of the goals in a goal 
hiearchy to facilitate retrieval. 

Mlt = {(Go, C'i?o), (Gi, Gi?i), • • • , {Gm-i,CRn-i), {Gm,CRm)} 
where /(Go) > /(Gi) > ...I{Gn-i) > I{Gn) 

During the update cycle, Mst should be searched for patterns which an entity 
could take advantage of, and tuples new to Mlt should be added. The best 
time for elements of Mst to be Altered for M^t is when the process of Altering 
does not hinder the entity. Psychologists have suggested that this operation may 
occur in humans during sleep m- 

The disadvantage of CR is the computational expense associated with the 
creation and maintenance of the dynamic index structure. However, the advan- 
tage of CR is that it enables entities to adapt to new situations which do not 
need to be considered in their speciflcations, so complete domain knowledge does 
not need to be integrated into their design. The speed of behavioural response 
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depends on how efficiently the memory is indexed so CR is given less priority 
than UR. In contrast with analogical response, a 1 to 1 mapping exists between 
what has been learnt and the current situation so CRis given greater priority. 

The behavioural response of an entity can be called an analogical response 
(AR) where no C/i? or CR exists for the particular situation. Instead of searching 
for a behaviour which will produce an response which is based on matching 
an input case to a case in long term memory, a similarity metric is performed 
between the cases in memory and the input case in an attempt to decide the ’best’ 
behaviour to perform in an intelligent manner. Consider a perceived situation P 
and correct behaviour A at time X and a long term memory with one goal Gq. 

For a perceived situation and correct response {Pt^x, At=x+i) 

and Mlt = {(Go, Gi?o = (^b, ^o)j {Pi,Ai), • • • , {Px-i, Ax-i), {Pn, ^w))} 

then At=x+i = Aq where Pq = Pt=x, {Aq,Pq) G Gi?o 

AR can be compared to case adaptation in case-based reasoning, which has 
been utilised in domains of complex design, and at present we are considering 
how AR might be useful for entities in the context of simulating a story. 

3 Implementation of the Adaptation Model 

To demonstrate that our stratified behavioural model was applicable in simulat- 
ing adaptation in biological environments, we chose to implement two situations 
which required the behavioural adaptation of entities: the excitation behaviour 
of bees in the presence of an intruder; and the behaviour of a group of cats who 
learn that the sound of a tin signifies that they will be fed. These simulations, 
although simple, could provide the building blocks for more complex examples. 

The bee simulation had an UR with both an IMR and EMR. Note that for 
UR, all of the states which the bees could perceive were specified before runtime. 
The mental states are not defined when relating perception to action, 

C/i? = {{Po, ^o), (Pi,Ai),- ■ ■ , {Pn-i,An-i), {Pn, ^at)} 
but were defined relating perception to the emotional mental state. 

UR= {{Pq, Eq),{Pi,Ei), - ■ ■ , {Pn-i,Ex-i), {Pn, En)} 

The cat simulation had some UR and IMR. However the definition of the cat 
UR is identical to the definition for the bees, so only the method for generating 
CR through the use of EMR is described. All of the situations that the cats 
perceive and their responses are stored within their short term memory Mst- 
When cats are not engaged in moving for a particular period of time, M$t is 
filtered into long-term memory by considering the recurring sequence of 

events which precedes the satisfaction of the goal of eating food. A conditioned 
response is then created which allows the cats to satisfy their goal faster. Goals 
are only used in analysing the Mst and are not explicitly represented in the 
CR: 



CR = {(Fbj Aq), {Pi,Ai), • • • , {Pn-i,An-i), {Pn, ^at)} 
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3.1 Simulation of Adaptation in Bees 

Bees change their behaviour in the presence of an intruder, from a calm patrolling 
state to an angry attacking state |12| . In the simulation a bee detects an intruder 
at its bee hive and begins to get excited and simultaneously emit an alarm 
pheremone which increases the pheremone level in the situation. This emission 
of alarm pheremone then excites other bees who also emit the pheremone. Once 
the alarm pheremone level exceeds the alarm threshold of an individual bee they 
become fully excited and attack the intruder. 

For our bee representation, each bee had three emotion (E) tuples relating 
a particular perceived situation in the world to a new situation where their 
emotions have changed. 

Pq = no intruder Eq = emotion(calm) 

Pi = detect intruder and emotion(calm) Ei = emotion(excited) 

P2 = detect intruder and emotion(excited) E2 = emotion(aggressive) 
and pheremone level > alarm 

Bees also had four UR tuples. The goals of bees are implicitly represented be- 
cause the actions of bees protect their hive. 



Pq = no intruder 

Pi = detected intruder and emotion(calm) 

P2 = detected intruder and pheremone level < alarm 
and emotion(calm) and Ai 
Pq= detected intruder and pheremone level > alarm 
and emotion(aggressive) and A\ 



Aq = patrolling 
Ai = threatening 
A2 = release pheremone 

A3 = fighting and A2 



The class relationship diagram of the bee simulation is shown in Figure 2. 



Legend: 

Class Relationship Icons 

HasA (always) • 

HasA (at time t)# 

Association syntax 
1 = one class 
N = many classes 



I Bee 
! Hive 
I Situat] 




Fig. 2. Bee Simulation Class Diagram 



3.2 Simulation of Adaptation in Cats 

In the simulation of feeding cats the cats learn that the cats’ owner S hitting a 
tin of cat food implies that S has put out food for them. The situation is divided 
into two partitions known as Kitchen and Lounge Room respectively. The cats 
prefer the room which is warmer so on the basis of warmth alone the cats will 
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spend the majority of their time in the Lounge Room. Each time S puts out 
food they then hit the tin of cat food. If the cats do not come to the kitchen 
S picks them up and carries them. After a few times of being carried the cats 
learn that the sound of the tin implies that food is available in the kitchen. The 
class relationship diagram of the cat simulation is shown in Figure 3. 



Legend: 

Class Relationship Icons 

HasA • 

Inheritance ^ 

Association Syntax 
1 = one class 
N = many classes 



^Situation Blackboard”! 




Fig. 3. Cat Simulation Class Diagram 



The cats were given two U R type behaviours which enabled them to cope 
with situations where conditions existed simultaneously by using an EMR with 
goals. The goal importance I determined which tuple had priority. 

URcat = {(lo, Go) ^o)> (-Pii Gi: ^i)} where I{Go) > I{Gi) 

Po = detect food Go = satisfy hunger Aq = eat food 

Pi = detect cold G1 = satisfy warmth Ai = move to warmer place 

During the simulation, the cats updated their long term memory while all of 
their goals were satisfied. By examining their short term memory of events the 
cats realised that the best time to move to the kitchen for food was when the 
sound of the tin was heard. Thus 

Mlt = {{Go,GRo = (Po,4lo))} where 
Po = detect tin sound Aq = move to food place 

4 Conclusion 

Previously, entities in simulations had all the important domain knowledge em- 
bedded into their specifications. However, these entities are unable to function 
optimally in other situations, simply because they are not designed to do so. 

We presented a theoretical adaptation model based upon a stratification of 
behavioural responses which enabled entities to adapt to new situations. Our 
model was implemented in two simulations which demonstrated different ele- 
ments of implicit and explicit mentalistic representations. The work presented 
is important to the development of our wider area of study- story authoring- 
which is currently being implemented. 
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Computer simulations have been attempted for a long period of time, and 
entities within these simulations often embody large amounts of complex and 
specific domain knowledge. One can envisage the benefits of cross-domain inter- 
ations which adaptation capabilities would make possible for entities. 
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Abstract. Evolution could be assumed as a natural reinforced learning. 
We tried simulations of Mutual-association with varying population size 
to investigate evolution and learning. Mutual associative memory is our 
extension from hetero-association or temporal-association of the Asso- 
ciative Memory by J. J.Hopheld[l]. Mutual Associative Memory is used 
as memory of organism for the tool to investigate evolution and learning. 
Genetic Algorithms are used to evolve the weights of mutual associative 
memory. We got the result that evolution of learning can be observed 
when organisms change rule itself during their lifetime. 



1 Introduction 

Mutual associative model typically associate the pattern to be stored with dif- 
ferent patterns. We name the word ’’mutual associative memory” as an extended 
concept from so called hetero-association of simple perceptron. Our mutual as- 
sociative memory uses fully connected neural network instead of multi-layer 
networks with back-propagation and observe the evolution and the capacity of 
learning using genetic algorithms. 

Mechanism of mutual associative memory is implemented by adjusting the 
weights, which means the degree of how much effect the synapses give to the neu- 
rons connected by the synapses to each other. This weight or the degree of effect 
is called ’’weights of connection” or ’’connection matrix” since it is usually repre- 
sented with matrix. We call the adjustment of weights of connection ’’learning” 
or ’’remembrance” of the neural network. And we call the learning(adjustment) 
of weights of connection during organism’s lifespan ’’Mutual Association” of the 
neural network. Increment of the capacity of being able to mutually associate 
over many generations is called ” Evolution of Learning” . 

We have proposed a toy model of mutual associative memory on a few pa- 
pers[2][3]. Now the models became more natural and the simulation time be- 
came shorter since varying population size by death of organism was added to 
our Genetic Algorithms. We call the mutual associative memory an organism or 
an individual after this since we assume the mutual associative memory as an 
organism which have the ability of learning in a virtual environment. 

X. Yao et al. (Eds.): SEAL’98, LNCS 1585, pp. 446-453, 1999. 
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2 Mutual Associative Memory 

The model of mutual associative memory use genetic algorithms to evolve the 
functions of recalling memories and mutually associating memories or experi- 
ences. We think that genetic algorithm could be a model of reinforcement learn- 
ing nature have been executing for more than three billion years over. The se- 
lection of parents and the reproduction of offsprings in the genetic algorithm 
correspond to the single yes/no reinforcement signal in reinforcement learning. 

Figure 1 shows our meaning and concepts of the words ” Mutual Association” . 
In auto-association the recalled pattern(output) is same as the input pattern, on 
the contrary the output is distinct from the input in hetero-association. These 
two associations deal with only known patterns. Third and fourth association 
also deal with unknown patterns. We want to call third association in which 
the input is an unknown pattern and the output is a known pattern ’’Learning 
unknown” or simply ’’Learning”. Fourth association in which the input is a 
known pattern or an unknown pattern and the output is a new unknown pattern 
is called ’’Creating”. 



Mutual Association has four kinds of associations 

•^trigger(input) association(output) 

0. Auto-association : a known pattern ^ the known pattern itself 

1. Hetero-association : a known pattern 0 another known pattern 

2. Learning(unknown) : an unknown pattern a known pattern 

3. Creating : a known pattern or ^ a new unknown pattern 

an unknown pattern 

known pattern means : 
a pattern learned by initial Hebbian learning 
^^^^n unknown pattern means : 

a pattern not learned by initial Hebbian learning 
(a new pattern organism meets in his life) 



Fig. 1. Concept of Mutual Association 



2.1 Evolution of Mutual Association 

(1) The initial weight matrices(wlf ) are made by Hebbian rule or all zero. Heb- 
bian rule determines the elements of the weight matrix as follows: 

1 ^ P 

/i = l fJ, = l,U = l 

where p means the number of patterns to be stored, is the i-th bit of the p-th 
initial patterns. A is a constant that governs the relative strength of Rrst and sec- 
ond terms. The second term specifies the fixed pairs between initial patterns(^'^ 
and for /^(S) and /j(7). 

(2) N chromosomes which have a fixed length from 2401 to 4096 alleles are ran- 
domly made. They are chosen randomly from { — 1, 0, 4-1}, where the probability 
of selecting either —1 or 0 is set to 0.01 in this paper. Allele —1 means to reverse 
excitatory/inhibitory connections, and 0 means to remove the connection. These 
alleles(— 1 and 0) are used to give a small perturbation to synaptic weights as 
Sompolinsky wrote[5]. This make the initial weight matrices slightly asymmetry. 
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= w% + c'-{Ni + i) ( i, i = 1, 2, • • • , #; n = 1, 2, • • • , 128 ) 
where denotes {i,j) element of the n-th weight matrix in the population, 
c"(fc) denotes the k-th allele of the n-th chromosome. 

(3) Renew state asynchronously with a discrete time, as follows: 



N 

Si(t + 1) = sgn(wiiSi(t) + 

i^i 



where Si{i) is the state of i-th neuron at time t, and sgn is the sign function 
to be sgn(x) = I if x > 0, sgn(x) = —1 if * < 0. Hopheld set the self- 
coupling diagonal terms wa = 0. We found when wa > 0 (chaos neural network) 
auto-association converges extremely fast. 

(4) Go to (5), (6) or (7) depending on the kind of mutual association 

(5) Evaluate mutual htness value fh for hetero-association. At hrst sum of the 
similarity between the initial state vectors and varying state vectors over a Rxed 
time Tmax is calculated as the mutual relation. Then this sum is divided with 
the product between the number of patterns to be stored p, a certain maximum 
life time T^ax, and the number of neurons N . 



fh = 



y-p 

LL=.lv-=. 



Z-^t = 2 



P ■ T„. 



j:Ue 

N 






where is the j-th bit of the p-th initial pattern which takes the value of either 
— 1 or -fl. s)'(t) means the state of the j-th neuron at time t when the v-iYi 
initial pattern is given to the network, ff^ = I means all the pairs of and 
are stored as fixed points. Goto(8) 

(6) Evaluate mutual fitness fj for an unknown patterns as follows: 



fl 



n=i 



l^j = 



' 3=1 3 ’^3 



(f) 



P • Trr, 



N 



where Trj" is the j-th bit of the p-th unknown pattern. Go to (8). 

(7) Evaluate mutual fitness fc for both known and unknown patterns : 



fc = 



y-P 

/i = 1 , = 1 , /i 7^ 



yTm 

l^t = 2 






■3 



7TjSj(t) 



P ■ Trr. 



N 



where Trj", and s)'(t) mean the same as (5) and (6). 

(8) Evaluate fitness fa for auto-association as follows: 



fa = 



E P 

fi = l l^t = 2 

P ■ Trr. 



Ef=A 



7 =1 3 

N 



(7 



fa = l means all the patterns are stored as fixed points. 

(9) Extend or shorten the lifetime of organisms according to the mutual fitness 
value calculated by (5), (6) or (7). Kill organisms whose remaining days are zero. 
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(10) Select two parent at random from the upper 40% of the population sorted 
in descending order. Then recombinations are made with uniform crossing over 
to generate child chromosomes. Next mutation occur upon the offsprings with 
mutation rate 0.05. The value of randomly selected allele in chromosome c"(fc) 
rotates cyclically such as :+!=>— 1=>0=>+1 

(11) If the highest htness value (different in the cases which use fh,fi, or fc) 
reach 1.0 or the number of generation exceeds the upper limit this simulation 
terminates. If not, above processes from (2) to (10) are repeated. 

Mutual associative memory evolve the weight matrix explained at (1) during 
above processes. The above simulation is a mixture of ’’learning on an individual 
level during its life time” and ’’learning on a population level during evolution”. 



2.2 Varying population size 



We simulated the hetero-association with varying population size. The lifetime(4 
generations on average) is extended or shorten according to the value of mutual 
htness of an organism. Even 5% natural increase of population would cause an 
exponential growth of population size if constant decrease of population were 
not made. Figure 2 (a) shows the increases of hetero htness with decreasing 
population down to 64 organisms every 50 generations. Thick zigzag broken line 
is the change of B-population. Figure 2 (b) shows the result of same simulation 
with constant population. 

We calculated the total population in order to study the relation between the 
increasing of htness value and the total population at a certain generation. Figure 
2 (c) shows the comparison of them. Here the lines which have ” v” at the head are 
the results with varying population size, and ”c” shows the results with constant 
population. Meanwhile the hrst column shows total population and the second 
shows the values of htness. We can see both the values of B-hetero-htness with 
varying population and constant population are nearly equal (**), although total 
populations of constant population is nearly twice as much as total populations 
of varying population at both 1000th and 12000th generation(*). 

Our models became more natural and the simulation time became shorter 
since varying population size improved the performance more than double. 



(a)Decrease to 64 organisms every 50 genarations (b)Constant population(1 28 organisms) (c) Fitness & total population 

total population B Fitness-B 

1. from 0 to 100 generations 

V 0734 0.13432S 

c 12800 0.144468 

c/v 1.315 1.0755 

2. from 0 to 1000 generations 

V 71792 0.202175 

c 128000 0.218822 

c/v 1.783 • 1.0823 ** 

3. from 0 to 120000 generations 

V 823325 0.247745 

c 1536000 0.267727 

1.866* 1.0807 •• 
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Fig. 2. Varying population size and Constant population 
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2.3 Hetero-association(associating another known pattern) 

We simulated hetero-associations with learning the pairs between initial patterns 
(A > 0 and A = 0 on 2.1(1)) and without learning them. Figure 3(left) shows 
the results of simulation using 2.1(5) with varying the values of A=0 to 3.0. 
Organisms remember 16 initial patterns, and also 8 pairs between initial patterns 
at birth if A > 0. Hetero-Rtness increases as the value of A increase from 0 to 
1.0 on both generation=0 and 12000. But they begin to decrease slightly when 
A rises more than 1.0 since the memory of initial patterns are gradually lost. 

Figure 3(right) shows the same results as the figure 3(left) except it remem- 
bers the 4 pairs of initial patterns in the beginning which is half the number 
compared with figure 3(left). In figure 3(right) hetero-fitness increases slightly 
even when A become more than 1.0 on both generation=0 and 12000 since the 
organism has enough room for storage capacity. 

We can observe the effect of learning the pairs initially (A > 0 in 2.1(1)) be- 
come insignificant after long evolution. This can be seen by the curves of mutual 
fitness on generation=12000 of figure 3(thick broken lines). Natural reinforce- 
ment learning which corresponds to from 2.1(2) to (10) is not inferior to the 
compulsory learning the pairs of patterns at his birth. 

Figure 4 shows the results of hetero-association started with the values of 
A=0 or 1.0 (left or right) and 12 initial patterns. In the figure 4(left) hetero- 
fitness cannot catch up with auto-fitness. But, in the figure 4(right) hetero- 
fitness catches up with auto-fitness on near 300th generation and keeps higher 
value from near 500th generation. This is the result of smaller number of initial 
patterns and the value of A > 0. The curves of hetero-fitness and auto-fitness 
show nearly a line symmetry. 





Fig. 3. Hetero-fitness vs A (N=49,p=16(left),p=8(right)) 
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2.4 Learning nnknown 

Learning may be said to associate a known pattern from a given unknown pat- 
tern( see Figure 1 and 2.1(6) ). This process may be recognized as the process of 
learning, for example, new languages. We can also observe auto-Rtness decreases 
as learning-fitness increases, and the curves of learning-fitness and auto-fitness 
show nearly a line symmetry. 

3 Evolution of learning 

Here we try to simulate the evolution of learning capacity using the mutual 
associative memory. This is the simulation of mutual associative memory with 
many populations. To be exact, this simulation of evolution of learning is the 
simulation of ’’learning with many teachers among many populations” . An imag- 
inary language is the target of ’’Learning”. We also try to simulate one of the 
’’Dynamics to change rule itself’ since one of the essence of life is the ability to 
change itself or open dynamics[6]. 

We simulate here the evolution of learning capacity or the evolution of weight 
matrixfor learning no more than tens of words. First, we prepare two populations 
which have evolved their own language separately. Second, two populations begin 
the communication with one of their languages. The fitness function is the ability 
to communicate to another populations with some language. The survival rule 
is the same rule as the simulation of mutual association. The situation where 
third medium which neither A nor B know is used can be consider. This may 
mean creating new language. 

Figure 5 shows the algorithm of the evolution of learning on the condition that 
organisms change rule by themselves. They always watch the other populations 
to see which language-fitness has higher value in order to change the language 
which they learn next. 



Genetic Algorithm Implementation ( Evolution of Learning Ability ) 

initialize N populations which has €4 - 128 organisms 
by learning own language with Heb njle 

, while ( mulual-fitness<1.0 88 genaralion<12000 ) do 

evaluate mutual-fitness of N populations 

by oomparing with N learning abilities 
select language learning from N languages 
evaluate mutual-fitness of each individual 

by oountihg how many relations can be memorized 
r— repeat until the worst 60% in the population are replaoed do 
select two Individuals randomly from the best 40% 
recombine them with unifomi crossover to produce offspring 
mutate all the offsprings 

I 



Fig. 5. The algorithm of evolution of learning the language on the condition that 
organisms change rule itself 

3.1 Evolution when rule never change 

We observe evolution of learning as the interaction among many populations. 
Figure 6 shows the result of A population and B population with the same fixed 
rules. The rules are that A population uses only their language A from the start 
to the half of their lives, but starts learning the language B from half to the end. B 
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population uses the same rule as A population. The language-htness is calculated 
using the expression 2.1(6) : mutual-htness fj for the unknown patterns. The 
hgure 6 shows both language-htness of A and B increase their values as the 
generation go on. They show nearly same values though the amplitude of B 
is bigger than A. On the contrary, both htness of A and B decrease. These 
results show both A population and B population have more abilities to learn 
foreign language as the generation go on. The simulations with more than two 
populations also shows progress as the result of evolution. 




Fig. 6. Competition of language ability by two populations A and B 



3.2 Evolution when organisms change rule 

Here we simulate on the condition that organisms change the rule itself. First is 
the simulation of evolution when populations change language learning during 
their own lives. Second is the simulation of evolution when populations change 
language learning on the start point of each generations. 

Figure 7 shows the result of Rrst one. On this simulation all organisms have 
two kinds of processes in their life times. Organisms learn their own language 
at first half life, and at second half life they decide to learn what language to 
learn by comparing the language abilities of other populations. We can observe 
evolution on both A-language-Rtness and B-language-Rtness. A-language-Rtness 
has chaotic bifurcation. 

Figure 8 shows the result of same simulation with four populations. Here 
C-language-Rtness shows much higher increase than A, B and D. Sudden twice 
increases at about 500th and 4000th generation of C-language-Rtness attract 
much attention. These may be happened by mutations. 

Second simulation that populations change language learning on the start 
point of each generation showed no evolution except Rrst 500 generations. 

3.3 Creating new medinm 

Creating new medium means that organisms invent, for example, a new sign 
comprehensible only to them. We tried two simulations with the initializations 
by Hebbian weight matrix and Zero matrix (see 2.1(1)). The organisms do not 
learn their native language when we use Zero matrix for the initial weight. The 
mutual Rtness fc is used here for evaluate communication-Rtness (2.1(7)). 
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Fig. 7. Two populations A and B change language learning in their life time 




Fig. 8. Four populations change the language learning in their life time 

We could observe evolution of communication only when two populations 
have their native languages. Simulation without native languages showed in- 
signihcant increase of their communication-htness. 

4 Conclusion 

(1) Genetic algorithms with varying population size improves the performance. 

(2) We got a mutual capacity of 14% as a result of evolution of hetero-association. 

(3) The cognition between a known pattern and another known pattern is easier 
than the cognition between a known pattern and an unknown pattern. 

(4) Evolution of learning can be observed only when organisms change rule itself 
during their lifetime 

(5) Evolution of communication or making contact with others needs initial 
knowledge of native language in advance. 
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Abstract. This paper presents a cellular automata model of a duopolis- 
tic market with consumers’ leraning and network externalities. The model 
produces various dynamics of the market. In particular, if the user cost 
can locally be different, it generates such rich dynamics that aggregate 
models could not explain. The results of simulations also suggest that the 
long-run consequence of duopolistic competition may crucially depends 
on the initial condition. 

1 Introduction 

When you buy an application, you may probably take account of how long you 
have used it and how many of others will use it. Even if an application with 
higher performance is available, you may hesitate to change it from the one you 
are familiar with, suspecting understandably that mastering a new application 
may require considerable time and effort. You may however abandon the use 
of your favorite application if increasingly many of your friends and colleagues 
use another one, fearing naturally that adhering to it may make it difficult to 
exchange data and programs with them. We should like to examine such mar- 
kets where consumers consider these things: consumers’ learning by doing and 
network externality, which plays an important role in the so-called information- 
oriented society. 

This paper, which describes a duopolistic market with network externality 
and consumers’ learning by doing, is a generalisation of our previous one (Oda et 
alE) does, which examines the dynamics of a monopolistic market with network 
externality. In fact both are cellular automata models; what is new in the present 
work are only the existence of a rivalling product and the dependence of the 
consumers’ reservation prices for products on their past purchasing behaviour. 
Although consumers do not like move around, their behaviour is influenced by 
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their experience so that our model has become similar to the CA+ Agent models 
of (Epstein and Axtell BDfl 

We shall explain our model in Section 2 and mention a few results of its 
simulation in Section 3. Owing to the introduction of consumer’s learning and a 
rivalling product, the dynamics of the market has become much more complex: 
in addition to the drastic change of the final equilibrium by a small change in 
the initial condition, it is often observed that the market goes on changing in a 
complicated — not simple but not random — manner. 

2 The model 

Let us assume the following. 

1. There are consumers in a closed society. Every consumers has a personal 
computer for which two operating systems are available. To use an OS, each 
consumer must make a new or renewal contract at its supplier at the beginning 
of every week. We designate X{m,n,t) = 1 if Consumer m {m = (mi, m 2 ), 
1 < mi < M and 1 < m 2 < M) contracts with the supplier of OSn (n = 1 or 2) 
for Week t (t = 0, 1, 2, . . .) and X{m, n, t) = 1 if he or she does not. 

2. The utility which Consumer m obtains from using his or her computer for 
Week t is 

U{m,t)= max {X(rn,n,t)U{m,n,t) + aX{m, 3 — n,t)U{m, 3 — n,t)) (1) 

n (1.2) 

where a is a constant while U (m, n, t) represents Consumer m’s utility from using 
OSn alone. Here 0 < a < 1 is assumed because using two operating systems does 
not brings in twice as much utility as using one. 

3. Consumer’s utility from using an OS consists of three terms: 

U (rn^ — U min T ^{Uvnax Cmm)L(m, n,t)-t-(l i^Umax Umin)^ f ) 

( 2 ) 

where Umin, Umax and 9 are given constants (0 < Umin < Umax and 0 < 6* < 1). 
Here the first term of © stands for the basic utility that a beginner can readily 
obtain from standing alone computer usage. 

4. The second term of represents the effect of consumers’ learning by doing: 
one can obtain more utility from the same OS as he or she uses it longer. Here 

^ The other new point is that each consumer’s neighbourhood is probabilistically de- 
termined according to the method developed by (Markus and Hess H). Yet it does 
not seem to make significant effects at least in the simulations mentioned in this 
paper. 
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L(m, n, t) stands for the skill for using OSn that Consumer m has acquired till 
time t (the beginning of Week t), which is defined as 



L{m, n, t) 



= L{m,n,0) for t = 0 

t 

= A^(l — A)^ max(X(m, n, f — k), pX{m,3 — n,t— k)) for t > 1 



t k=l 

( 3 ) 

where A and /3 are given constants (0 < A < 1) while L(m, n, 0) are all given as 
a part of the initial condition (0 < L{m,n,0) < 1). Here A stands for the speed 
of skill depreciation, while j3 represents the substitutability between the two 
operating systems: the increase of the skill for using an OS by using the other OS 
alone is 100/3 percent of its increase by using the OS. Note that the second term 
of l[5|) is regarded as the product of the degree of skill accumulation L{m, n, t) 
and its absolute weight on the total consumer’s utility 9 {Umax — Umin), because 
0 < L(m, n, t) < 1, limt L{m, n, t) = 1 if X{m, n, 0) = X{m, n,l) = . . . = 1, 
and limt L{m, n,t) = 0 if X{m, n, 0) = X{m, n, 1) = . . . = 0. 



5. The third term of (|2l) stands for the effect of network externality, which is 
determined by 



N{m, n, t) 



|l7(m)| 



( 4 ) 



Here 7 is a given constant (0 < 7 < 1); fl{m) represents the set of Consumer 
m’s neighbours: 

fl{m) = {Consumer i\ dis(i,m) < i?} (5) 

where i? is a given constants (1 < i?); |J7| stands for the number of Consumer 
to’s neighbours. It is tacitly assumed that in our model cells (consumers) are 
arranged so that those who exchange more information are nearer. That is to 
say, in our terms neighbours are not those who live in neighbourhood but those 
who share the same interest. 

We can find some similarities in the second and the third term of ([21). First, 
since 0 < N{m,n,t) < 1, we can regard the third term as the product of the 
degree of network externality N{m, n, t) and its absolute weight on the total 
consumer’s utility (1 — 9){Umax — Umin)- Secondly, (3 and 7 play a similar role: 
/3 is smaller if consumers can use both operating systems in a more similar 
way, while 7 is smaller if users of different operating systems can more easily 
exchange data and programs. Thirdly, f2{m) corresponds to A: the former sets 
the contemporary boundary to network externality while the latter limits the 
benefit from past experience. 



6. Consumers follow a simple adoptive behaviour: they calculate N{m, n, t) on 
the supposition that X{i,n,t) = X{i,n,t — 1) for all i G f2{m). In other words, 
at time t Consumer m expects the following utility for Week t: 



N{m, n,t) 



N{m, n, 0) 

^ max(X — ,jX (i^3—n,t — l)) 



fort = 0 
fort > 0 



( 6 ) 



J?(m) 
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where N{m,n,0) are given as the other part of the initial condition (0 < 
N{m, n, 0) < 1). We also define U{m, n,t) by replacing iV(m, n, t) with N{m, n, t) 
in 0 and U{m,t) by replacing U{m,n,t) with U{m,n,t) in 0. Here we have 
explained how consumers expect their weekly utility U (m, t) at the beginning of 
each week. 

7. Consumer’s cost for using a computer is given by 

C{m,t) = X{m,l,t)Pi + X{m,2,t)P2 (7) 

where Pn represents cost for using OSn. In the next section it will be assumed to 
be constant for Examples 1, 2 and 3 of the next section while it will be regarded 
as 

P{m,n,t) = Q{n,t) + R{m,n,t) (8) 

for Examples 4, 5, 6 and 7. Here Q{n,t) stands for the rental fee for using OSn 
while R{m, n, t) represents the consumer m’s fees for using OSn applications. 
The former decreases as the total number of the users of the OS increases, while 
the latter decreases as the number of the consumer m’s neighbours who use the 



Q{n, t) — Qnmin 4" (Q»max Qnrain)^i^J 


(9) 


-R(?72, 71, t) Rnmin (-^nmax -^nmin)y(^5 


(10) 


/ , , N x(n, t — 1) 

x{n, t) = rZ{n, t — 1) H 


(11) 


y{m,n,t) - rW{m,n,t i) + ^ 


(12) 




(13) 


W{m,n,t 1 = — ■ 

|C(m)| 


(14) 



Here Qnmaxi Qremini -^nmaxi Rnmin ^nd T are all given positive constants. 

8. At time t Consumer m calculates 

V (m, n,t) = U (m, t) — C{m, t) (15) 

for all the four possible combinations of X{m, l,t) and X(m,2,t): (0,0), (0, 1), 
(1,0) and (1,1), and chooses the combination that maximises V{m,n,t) as 
(X(m, l,t), (m,2,t)). 

3 Simulations 

Let us show some results of simulations for the following value of parameters and 
the set of the initial condition: M = 50, R = 2, Umin = 0.2, Umax = 0.4, a = 
/3 = 7 = 0, A = 0.5 and L{m, n, 0) = 0.5 for all m and n. In the following figures, 
black points, gray points and white points represent OSl users, OS2 users and 
non-users respectively (no consumer use both operating systems simultaneously 
in the following examples). 
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3.1 Examples 1, 2 and 3 

These are cases where are given constants (Pi = P 2 = 0.25). Since all the 
parameters are common to both operating systems, their technical performance 
is the same. Fail or success totally depends on the distribution of the initial 
users. 

Example fc(l < fc < 3) is more advantageous for OS2 unconditionally than 
Example k—1. Because the initial distribution of the OSl users are common to 
the three examples, while initial OS2 users are chosen so that an OS2 initial user 
in Example fc(l < fc < 3) is an OS2 initial user in Example fc — 1. In addition, 
the initial condition for every example is chosen so that all consumers will user 
an OS if the initial users of the other OS does not exist. 

The long-run consequence of competition is quite understandable: in Example 
1 OSl dominates the whole markets; in Example 2 OSl and OS2 shares the 
market; in Example 3 OS2 monopolises the market. 

Example 3 seems noteworthy. OSl users and OS2 users rapidly increase al- 
most at the same rate till every consumer uses either product, but then the 
former gradually decrease and disappear in the end. Yet neither products’ prop- 
erties nor consumers’ behaviour has changed when the market is saturated. Both 
the rapid diffusion of OSl and its fade-out are explained by the same value of 
parameters and the same utility functions. 



3.2 Examples 4, 5, 6 and 7 

Let us examine cases where the price of the same product may locally be dif- 
ferent. Let us assume that Rnmax = R + a, and that Rnuim = R — a. Since 
Rnmax ~ Rnmin = 2u, the greater a is, the larger Pi can locally differ. Figures 4, 
5, 6 and 7 show cases where a equals 0, 0,03, 0.04 and 0.09 respectively (all the 
other parameters and the initial condition are common to all the four cases). 

The dynamics of the distribution of the users of the two OSs (the black-gray 
patterns in the figures) qualitatively changes according as a increases. For a = 0, 
the black-gray patterns are finally fixed (every consumer goes on using either 
OS in the long run). For a = 0.03, the black and gray belts move rightwards for 
ever (every consumer continue to change the product he or she buys cyclically). 
For a = 0.04, the dynamics is most interesting: after 300 weeks, winding black 
and gray stripes (from upper right to lower left) emerge and continue to move 
(from upper right to lower left) with their shapes changing for 1000 weeks. 
Then the shifting diagonal stripes suddenly disappear; then no steady changing 
patterns can apparently be seen even approximately for thousands weeks; then 
again suddenly the shifting diagonal stripes appear. 

For a = 0.09, the black-gray patterns change unsteadily at least as long as 
the simulation is observed. In short, the larger a is, the more difficult it is to 
predict each consumer’s behaviour in the long run. 
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4 Concluding Remarks 

Though having made a small number of simulations for very limited combi- 
nations of parameters and the initial condition, we have found both results of 
simulations which could and could not be explained by some aggregate model. 
This suggests that the cellular automata model of oligopolistic market may have 
much richer dynamics than the aggregate model describes. We hope that our 
model and simulations could contribute to connecting individual consumers’ in- 
teraction and the dynamics of aggregate values in oligopolistic markets. 

This project “Methodology of Emergent Synthesis” (JSPS-RFTF96P00702) 
has been supported by the Research for the Future Program of the Japan Society 
for the Promotion of Science. 
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Abstract. This paper illustrates an integrated Computational Intelli- 
gence (Cl) technique using Artihcial Neural Networks (ANN) and Ge- 
netic Algorithms (GA) for Electric Load Forecasting. A load forecasting 
model has been developed based on ANN and GA. The model produces 
a short-term forecast of the load in the 24 hours of the forecast day 
concerned. Optimum weights and the biases of ANN are found by the 
Genetic Algorithm. The technique has been tested on data provided by 
an Italian power company and the results obtained through the appli- 
cation of integrated computational intelligence approach show that this 
approach is not practical without high computational facilities as this 
problem is very complex. However, this points to the direction of evolu- 
tionary computing being integrated with parallel processing techniques 
to solve practical problems . . . 



1 Introduction 

An accurate and stable load forecast is essential for many operating decisions 
taken by utilities. In fact, it is well known that a cheap and reliable power 
system operation is definitely the result of good short-term load forecasting. 
The short-term load forecast provides the information to be adopted in the on- 
line scheduling and security functions of the energy management system, such 
as unit commitment, economic dispatch and load management. Hence, accurate 
load forecasting is essential for the optimal planning and operation of large-scale 
power systems. 

Many techniques have been proposed and used for short-term load forecast- 
ing. Time-series models based on extrapolation are used for the representation of 
load behaviour by trend curves. The time series approach, regression approach, 
state-space models, pattern recognition and expert systems are also some of the 
other techniques used [1-5]. 

The time series approach assumes that the load of a time depends mainly 
on previous load patterns, such as the auto-regressive moving average models 

X. Yao et al. (Eds.): SEAL’98, LNCS 1585, pp. 462- |469| 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 



Object-Oriented Genetic Algorithm Based Artificial Neural Network 463 



and the spectral expansion technique [2] . The regression method utilises the ten- 
dency that the load pattern has a strong correlation to the weather pattern. The 
weather-sensitive portion of the load is arbitrarily extracted and modelled by 
a pre-determined functional relationship with weather variables. All the above 
approaches use a large number of complex equations that involve lots of com- 
putational time. More recently, artificial neural network (ANN) techniques have 
been used in many modelling problems [6-10]. One of the most popular train- 
ing algorithms for feed-forward ANNs is a gradient descent search algorithm, 
for example, the back-propagation (BP) approach, which tries to minimise the 
total Mean Square Error (MSE) between actual output and target output of an 
ANN. This error is used to guide BP’s search in the weight space. There have 
been some successful applications of BP algorithms in various areas. However, 
drawbacks with the BP algorithm do exist due to its gradient descent nature. It 
often gets trapped in a local minimum of the error function and is very ineffi- 
cient in searching for a global minimum of a function which is vast, multimodal, 
and non-differentiable. One way to overcome BP’s as well as other gradient de- 
scent search-based training algorithms’ shortcomings, is to consider the training 
process as the evolution of connection weights towards an optimal (near opti- 
mal) set defined by a fitness function. From such a point of view, global search 
procedures like GAs can be used effectively to train an ANN. Therefore, GA in- 
tegrated with ANN (GA-ANN) has been implemented for searching a solution. 
Object Oriented Techniques (OOT) were the framework for integrating ANN 
and GA. OOT gives us the ability to combine the existing objects (ANN object 
and GA object) and create new components (GA-ANN). 

2 GA and ANN Hybridisation 

There are three levels at which GA search procedures can be introduced to 
ANNs, namely, connection weights and biases, architectures and training algo- 
rithms. In here GA has been used to optimise the connection weights and biases 
of the neural network. 



2.1 Optimising ANN Weights Using GA 

The GAs training approach is divided into two major steps: the first one is to 
decide the representation scheme of connection weights, e.g., binary strings and 
the second one is the evolution itself driven by GA. Different representation 
schemes and GAs can lead to quite different training performance in terms of 
training time and accuracy. A typical cycle of the evolution of connection weights 
with GA is shown in figure 1: 

2.2 Representation of Connection Weights 

When using GA the most convenient representation is binary, since GA usually 
uses binary representation (chromosomes) of the problem parameters and binary 
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Fig. 1. One Typical cycle of evolution of connection weights with GA 



operators for combination. The range of each free parameter depends on the 
problem complexity and the required resolution of the system parameters. 

A key issue here is to decide how much information about an architecture 
should be encoded into a representation. This includes number of layers and 
number of neurons in each layer. As the architecture parameters decoded in 
GA individuals are increased, the computational cost increases. There is a trade 
off between these two factors as the combination differs for different classes of 
problems. 

2.3 GA and ANN Hybridisation 

Interaction between developed ANN and GA components are presented in or- 
der to explore possible benefits arising from these combinations, instead of using 
them individually. Object Oriented Technique gives us the ability to combine the 
existing developed objects and create new components. In order to perform this 
task, a through analysis on both objects should be done, including understand- 
ing the principles of the hybrid systems, identifying objects which will remain 
important in the life of the hybrid system and finally identifying the relationships 
between the different objects and the ways in which the objects interact. 

After analysing the system which includes identifying the object interactions, 
adaptation of classes in the new environment is performed. This task also in- 
cludes composing ANN free parameters including weights, biases and decoding 
them into chromosomes. The number of ANN free parameters is calculated as 
shown in the equation below to form the chromosomes. 

'^free ~ [(^m ^ ’^hid) T (j^hid ^ ‘^out) ‘^hid T ^out] ■ (1) 



where 

^in = Number of Nodes in Input Layer 
nhid = Number of Nodes in Hidden Layer 
nout = Number of Nodes in Output Layer 
nfree = Number of Free Parameters 
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Other parameters such as architecture and training algorithms can be added 
to the chromosomes as an extension. The interaction between ANN and GA 
objects is performed by message passing. Both ANN and GA instances are cre- 
ated at the beginning of the optimisation procedure and last until the end. The 
GA object makes calls to ANN object and passes messages to fitness function. 
The optimisation is processed to find the near optimum global solution for each 
applied problem. Our experience is that GA-ANN are highly application de- 
pendent, the approach is tested on a parabolic function approximation and on 
electric load forecasting as explained in the following section. 

3 GA-ANN Application 

3.1 Parabolic Function Approximation 

Parabolic function parameters (x and y values) were obtained using MatLab to 
create training and testing data files. The following system parameters are found 
to be the best for this problem, namely, population size, ANN free parameters, 
bits for each parameter, mutation rate, crossover rate, number of inputs to the 
neural network, number of nodes in the hidden layer and number of outputs 
which are 200, 16, 10. 0.1, 0.9, 1, 5 and 1 respectively. 

Similar network was constructed using BP-ANN for comparison with GA- 
ANN. Figure 2 shows comparison of BP and GA training schemes’ resulting 
error functions in the first 100 generations during the training. 




GA ANN BP ANN 



Fig. 2. Comparison between GA and BP during training for 100 Generations 



GA-ANN and BP-ANN were further trained to reduce the RMS error in 
order to obtain better results. Figure 3 shows comparison between BP and GA 
training scheme’s resulting error functions in the first 500 generations during 
the training. As shown in Figure 2, the GA-ANN system converges much faster 
than BP-ANN. It means that it finds a near global optimum with no significant 
difference in computational time between the two training schemes. The GA 
training scheme shows improvement over the BP in the first 100 generations 
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BP ANN GA ANN 



Fig. 3. Comparison between GA and BP during training for 500 Generations 



and presents a very good convergence. However, when GA generations increase, 
the convergence speed reduces rapidly. As in Figure 3, BP begins to converge 
faster than GA after about 220 generations. Figures 4 and 5 show the outputs of 
GA-ANN and BP- ANN for unseen data with 100 and 500 generations of trained 
GA-ANN and BP- ANN respectively. 




X-ualues 

Actual GA ANN BP ANN 



Fig. 4. Comparison between actual, GA-ANN and BP- ANN outputs for unseen data 
with 100 generations of trained GA-ANN and BP-ANN 



3.2 Short-term Load Forecasting 

There are 58 inputs to the developed model. The features that are taken into 
account as input factors in the load forecast system are as follows: two 24-hour 
load records of day i-1 and i-2 (the forecast day is day i. Six more inputs are 
the maximum and minimum temperatures of day i, i-1 and i-2. Three other 
inputs are the binary codes to show seven days of the week. One binary code 
is dedicated to the holidays or any yearly special occasions that may affect the 
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Actual GA ANN BP ANN 



Fig. 5. Comparison between actual, GA-ANN and BP- ANN outputs for unseen data 
with 500 generations of trained GA-ANN and BP- ANN 



forecast. In summary, the designed NN is of the MLP type and is used to learn 
about the relationship between the 58 inputs and 24 outputs. 

The input are: 



Hourly loads for two days prior to the forecast day 24 

Hourly loads for the day prior to the forecast day 24 

Max. and Min. temps for two days prior to the forecast day 4 
Max. and Min. temps for the forecast day 2 

Day of the week 3 bits 

Holiday 1 bit 

The outputs are: 

Load forecast for all 24 hours of the day 24 



The above values are normalised as indicated by Equation (2). 

Actual Value — Min , , 

Normalised V alue = ; . (2) 

Max - Mm 

where Max. and Min. are the maximum and minimum of the attribute respec- 
tively. 

The mean square error (MSE) is used to measure the accuracy of the model. 
The sigmoid activation function is adopted. The following system parameters 
are found to be the best for this problem, namely, population size, ANN free 
parameters, bits for each parameter, mutation rate, crossover rate, number of 
inputs to the neural network, number of nodes in the hidden layer and number 
of outputs which are 150, 1269, 16, 0.1, 0.9, 58, 15 and 24 respectively. 

The Back-propagation was also used to train another ANN which is then 
used as a reference to make a comparison between two algorithms. The training 
Mean Square Error (MSE) of both BP- ANN and GA-ANN for the first 100 
generations is shown in Figure 6. 
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As in Figure 6, it shows that the BP-ANN converges much faster than GA- 
ANN. It could mean that that it finds a better optimum in less number of 
iterations. Figure 7 shows the comparison between actual results and GA-ANN 
outputs for unseen data. 
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Fig. 6. Comparison between GA-ANN and BP-ANN for 100 generations 
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Fig. 7. Comparison between actual and GA-ANN outpnts for nnseen data 



4 Conclusions 

The main question is whether the GA-ANN is more efficient than conjugate 
gradient (e.g. BP) methods. In general GA-ANN gives better solutions for the 
problems with a small number of parameters. But for systems with a large num- 
ber of problem parameters it becomes impractical as it increases the compu- 
tational time and the computational cost. If powerful computer facilities are 
available, then GA-ANN are generally the preferred method. Parallel GA-ANN 
is one of the solutions to reduce training time. The OO methodology is a very 
useful framework in the development of GA-ANN as it reduces the development 
time. The developed OO models of all algorithms give this flexibility to upgrade 
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and maintain the software constantly and form different configurations. Artifi- 
cial Neural Network and Genetic Algorithms have been used to design a neural 
network for short-term load forecasting. The forecasting model has been used 
to produce a forecast of the load in the 24 hours of the forecast day concerned, 
using data provided by an Italian powerthe company. The results obtained are 
promising. In this particular case, the comparison between the results from the 
GA-ANN and BP- ANN shows that the GA-ANN does not provide a faster solu- 
tion than the BP-ANN. This could be due to the fact that the initial randomly 
selected starting point is a poor one. The size of the problem is very large and as 
such the amount of memory and computation time are large too. This points to 
the direction of parallel processing techniques being integrated with evolutionary 
computing to solve complex practical problems. 

5 Acknowledgements 

The authors would like to express their thanks to M Sforna and M Gaciotta of 
Electric and Automation Department, ENEL Research, Italy for providing the 
data. 

References 

1. Gross, G., Galiana, F.D.: Short term load forecasting. Proc. of IEEE, Vol. 75, No. 
12, 1987, pp. 1558-1573 

2. Hagan, M.T., S M Behr, S.M.: The time series approach to short term load fore- 
casting. IEEE Transactions on Power Systems, Vol. 2, No. 3, 1987, pp. 785 -791 

3. Papalexoponlos, A.D., Hesterberg, T.C.: A regression-based approach to short- 
term load forecasting. IEEE Transactions on Power Systems, Vol. 5, No. 4, 1990, 
pp. 1535-1547 

4. Rahman, S., Bhantnagar, R.: An expert system based algorithm for short-term load 
forecast. IEEE Transactions on Power Systems, Vol. 3, No. 2, 1988, pp. 392-399 

5. Dhdashti, A.S., Tudor, J.R., Smith, M.G.: Forecasting of hourly load by pattern 
recognition: a deterministic approach. Transactions on Power Apparatus and Sys- 
tems, Vol. 101, 1982, pp. 900-910 

6. Gaciotta, M., Lamedica, R., Cencelli, V.O., Prudenzi, A., Sforna, M.: Application 
of artificial neural networks to historical data analysis for short-term electric load 
forecasting. European Transactions on Electrical Power, Vol. 7, 1997, pp. 49-56 

7. Maifield, T., Sheble, G.: Short term load forecasting by neural network and a 
refined genetic algorithm. Electrical Power Systems Research, Vol. 31, pp. 9-14. 

8. Lai, L.L.,Sichanie, A.G., Rajkumar, N., Styvaktakis, E., Sforna, M., Gaciotta, M.: 
Practical application of object oriented techniques to designing neural networks 
for short-term electric load forecasting. Proceedings of the Energy Management 
and Power Delivery Conference, IEEE Catalogue No 98EX137, March 1998, pp. 
559-563 

9. Heng, E.T.H., Srinivasan, D., Liew, A.C.: Short term load forecasting using genetic 
algorithm and neural networks. Proceedings of the Energy Management and Power 
Delivery Conference, IEEE Catalogue No 98EX137, March 1998, pp. 576-581 

10. Lai, L.L.: Intelligent system applications in power engineering - evolutionary pro- 
gramming and neural networks. John Wiley and Sons, 1998 




Author Index 



Akira, Y., 446 
Anbarasu, L.A., 130 
Aoki, K., 260 
Araki, K., 325 

Baba, T., 251 
Barruncho, L.M.F. , 58 
Bergmann, N., 114 
Blair, A., 357 
Blair, A.D., 389 
Boman, M., 285 
Bundaleski, N., 365 
Burke, E., 187 
Burke, E.K., 66 

Carlsson, B., 285 
Carvalho, P.M.S. , 58 
Chen, S-H, 293 
Cho, D-Y., 146 
Cho, S-B., 301, 413 
Cooper, M., 171 

De Causmaecker, P., 187 
de Oliveira, P.P.B., 268 
Djurivsic, A.B., 365 

Estivill-Castro, V., 18 

Ferreira, L.A.F.M. , 58 
Fujimoto, Y., 179, 223 
Fujinaga, T., 231 
Funes, P., 389 
Furuhashi, T., 206 

Garionis, R., 98 
Gen, M., 276, 421 
Gero, J.S., 381 
Geyer, H., 106 
Graco, W, 74 
Green, D.G., 90 
Gwyn, B.J., 462 

Hall, R., 438 
Hallinan, J., 397 
Hashimoto, H., 240, 251 
Hashimoto, K., 260 



He, H., 74 
Hoashi, K., 260 
Horii, H., 122 
Husbands, P., 268 

Ida, K., 421 
Imada, A., 325 
Inuzuka, N., 231 
Ishibuchi, H., 82, 317 
Ishinishi, M., 162 
Itoh, H., 231 
Iwasaki, A., 309 
lyori, K., 454 

Jackway, P., 397 
Johansson, S., 285 
Jonsson, M.T., 430 

Raise, N., 223 
Katai, O., 198 
Kawakami, H., 198 
Kim, D-H., 154 
Kim, J-H., 2, 154 
Kim, Y-J., 154 
Kirley, M., 90 
Kohata, N., 251 
Konishi, T., 198 
Kunifuji, S., 122 

Lai, L.L., 462 
Lee, C-Y., 421 
Li, A., 341 
Li, E.H., 365 
Li, X., 90 
Liang, H-K., 42 
Liu, Y., 333 

Majewski, M.L., 365 
Mandischer, M., 106 
Matsumoto, K., 260 
Matsumura, Y., 10 
Matsuzawa, T., 122 
Miura, K., 454 
Moriwaki, K., 231 
Myung, H., 2 




472 



Author Index 



Nakano, R., 50 
Nakashima, T., 82 
Namatame, A., 162 
Narayanasamy, P., 130 
Newton, C.S., 42 
Ni, C-C., 293 
Nii, M., 317 
Niwa, T., 349 

Oda, S.H., 309, 454 
Ohkura, K., 10 

Pacheco, M.A., 373 
Pham, B., 26, 438 
Podlich, D.W., 171 
Porter, R., 114 
Porto, V.W., 215 

Rajkumar, N., 462 
Rakic, A.D., 365 
Reiser, P.G.K., 138 
Riddle, P.J., 138 
Runarsson, T.P., 430 

Sasaki, T., 34 
Sato, M., 240, 251 
Schwefel, H-R, 1 
Seo, Y-G., 301 
Shimohara, K., 413 
Shimooka, H., 179 
Sklar, E., 389 
Sood, V.K., 462 
Stanic, B.V., 365 
Subasinghe, H., 462 



Sundararajan, V., 130 
Sutton, R.S., 195 

Tachibana, K., 206 
Takagi, T., 240 
Tanaka, K., 317 
Tanaka, M., 349 
Tokoro, M., 34 
Tonkes, B., 357 
Torres- Velazquez, R., 18 
Tsujimura, Y., 276 

Ueda, K., 10, 309, 454 
Ulbig, P., 106 

Vanden Berghe, G., 187 
Varley, D.B., 66 
Vaseekar, E., 462 
Vellasco, M., 373 

Wiles, J., 357 
Wong, K-R, 341, 405 

Yamada, T., 50 
Yamaguchi, T., 240, 251 
Yao, X., 42, 74, 333 
Yearwood, J., 438 
Yoshimura, K., 50 
Yuryevich, J., 405 

Zebulum, R.S., 373 
Zhang, B-T., 146 
Zhang, Z., 26 




